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ABSTRACT 


This  paper  presents  a  system  of  automatic  error  recovery  for 
syntax-directed  parsing  algorithms  which  is  based  solely  on  the  syntax  of 
the  language.   This  system  of  automatic  error  recovery  uses  a  table  of  those 
symbols  which  can  follow  a  construct  to  determine  how  that  construct  might 
be  inserted.  Four  compilers  built  with  this  system  are  described  along  with 
examples  of  the  error  recovery.  The  translator  writing  system  in  which  this 
system  of  automatic  error  recovery  has  been  developed  is  discussed,  including 
the  syntax  description  language,  the  generation  of  Floyd  productions,  and  the 
parsing  table  built  from  the  Floyd  productions.  Finally,  the  author  presents 
suggestions  for  improving  the  error  recovery  in  compilers  built  using  either 
of  the  two  parsing  algorithms,  and  for  research  into  further  extensions. 
Possible  applications  of  the  techniques  described  in  this  paper  include 
extensible  compilers  and  compilers  for  computer  science  education. 
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CHAPTER  1.   INTRODUCTION 

1.1  Purpose  of  This  Work 

The  purpose  of  the  research  described  in  this  thesis  is  to 
provide  an  automatic  error  recovery  system  which  is  based  on  the  syntax 
of  a  language  and  which  can  operate  without  any  additional  effort  on 
the  part  of  the  language  designer.   These  objectives  seem  to  have  been 
realized.   Further,  the  system  described  comes  reasonably  close  to  the 
ultimate  goal  of  finding  all  the  syntactic  errors  in  a  program  and 
describing  them  precisely  to  the  programmer  in  order  to  minimize  the 
effort  required  of  him  to  achieve  a  syntactically  correct  program. 
Batch  processing  compilation  is  the  context  in  which  this  system  operates; 
however  the  techniques  used  could  equally  well  be  applied  to  an  on-line 
compilation  system. 

The  error  recovery  system  that  is  described  has  been  built  into 
a  compiler  building  system  based  on  a  modification  of  Floyd's  production 
language  (^0),  but  it  is  extended  to  a  recursive  descent  parsing  algorithm 
as  well.   Several  tests  from  several  different  compilers  implemented  on 
this  system  have  shown  it  to  be  superior  to  any  of  the  systems  with  which 
it  is  compared  in  both  effectiveness  and  clarity  to  the  programmer. 
Systems  with  which  the  present  one  is  compared  include  the  Burroughs  B5500 
ALGOL  compiler,  the  7090  ALCOR  Illinois  ALGOL  compiler,  and  one  other 
compiler  with  hand-written  error  recovery  mechanisms. 

Although  one  can  identify  five  levels  of  errors  which  a  pro- 
grammer must  overcome  to  be  able  to  have  his  program  perform  correctly, 
this  research  attempts  to  provide  automatic  and  effective  mechanisms 
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only  for  the  third  level,  that  of  syntactic  errors.   The  first  level  is 
that  of  keypunching  errors  which  prevent  the  job  from  even  getting  on 
the  computer.   The  second  level  is  that  of  errors  in  the  job  description 
to  the  operating  system  so  that  no  processing  can  be  done.  The  third 
level  is  syntactic  errors  in  the  compilation  which  prevent  the  object 
program  from  running.   The  fourth  level  is  that  of  fatal  run  time  errors 
such  as  divide-by-zero  which  terminate  the  job  prematurely.   The  fifth 
level  is  that  of  logical  errors  which  cause  the  results  of  the  job  to 
be  erroneous  even  though  it  is  able  to  run  to  completion. 

1.2  Contributions  of  Other  Research 

There  has  been  a  considerable  amount  of  work  done  on  giving 
the  programmer  help  with  errors  at  levels  four  and  five ,  but  he  has  not 
been  given  as  much  help  with  the  level  three,  syntactic  errors.   There 
has  been  almost  no  work  done  on  automatic  syntactic  error  recovery. 

The  work  on  debugging,  finding  levels  four  and  five  errors, 
has  taken  a  quite  varied  approach.   Chapin  (l8)  has  proposed  a  change 
in  machine  design  specifically  for  more  effective  debugging.  Bayer  (6) 
and  Hext  (52)  have  considered  matters  relating  to  program  failures 
(level  four  errors).   Some  theoretical  concepts  have  been  discussed  by 
Constantine  (22),  Green  (hi)    and  Van  Horn  (93),  while  some  specific 
debugging  systems  have  been  proposed  by  Balzer  (5)5  Grishman  (51)  » 
Kulsrad  (64),  and  Zadrevskii  ( 100).  Fuchi ,  et  al  (h6)   have  considered  a 
debugging  system  which  operates  by  simulation,  or  simulated  execution 
of  the  program.   The  most  popular  approach  has  been  that  of  on-line 
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debugging.   Examples  of  this  are  given  in  Baecker  (U),  Bernstein  and 
Owens  (10),  Brady  (ll) ,  Dean  (25),  Evans  and  Darley  (35),  Josephs  (58), 
Jossen  (59),  and  Zimmerman  (101),  Irving  and  Morrison  (57)  and  Pullen 
and  Shuttee  (78)  have  considered  debugging  in  some  special  situations, 
and  Levy  (69)  and  Varley  (9^)  have  considered  data-related  errors. 

There  have  been  many  compilers  constructed  for  practical 
application  and  each  of  these  has  had  to  do  something  with  syntax  errors. 
Some  of  these  have  handled  the  situation  quite  well  even  though  the  measures 
used  have  been  ad  hoc.   Compiler  building  systems  have  tended  to  be  less 
satisfactory  with  regard  to  syntactic  error  recovery.   All  but  two  have 
required  a  certain  amount  of  error  recovery  information  to  be  provided  by 
the  language  designer.   In  some  cases  the  systems  have  depended  entirely 
on  the  language  designer  for  any  error  recovery.   Feldman  and  Gries  (38) 
have  said  "the  problem  of  automatic  recovery  from  syntax  errors  could  use 
considerably  more  attention"  (p.  107)  and  "There  has  been  very  little 
effort  on  the  problems  of  automatic  error  detection  and  recovery  in  syntax- 
directed  processors.   Once  again,  even  a  bad  system  would  be  of  great 
value  to  users."  (p.  108)  The  two  compiler  building  systems  which  have  had 
error  recovery  built  into  them  independent  of  the  language  are  those  of 
Irons  (55)  and  Leinius  (68). 

Irons'  error  correcting  algorithm  is  based  on  a  multiple  track- 
ing top  down  parse  algorithm,  i.e.  goals  are  established  and  each  possible 
branch  is  extended  to  the  next  subgoal  until  either  all  the  goals  of  one 
branch  are  found  or  all  branches  terminate.   If  all  currently  active 
branches  terminate  without  finding  their  respective  goals,  then  the  error 
recovery  procedure  is  called.   This  procedure  attempts  to  fix  the  source 
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string  by  insertion,  deletion,  or  replacement.   It  makes  a  list  of  all 
syntactic  elements  which  can  follow  on  all  the  last  set  of  parsing  branches. 
It  then  scans  the  input  until  it  comes  to  one  of  the  items  in  the  list. 
The  string  is  then  fixed  in  the  simplest  way  which  will  allow  this  symbol 
to  be  connected  to  the  previous  parse.   Informally  Irons  has  stated  (per- 
sonal communication)  that  this  procedure  has  done  the  correct  thing  in 
about  20%  of  the  cases. 

Leinius  based  his  method  on  a  simple  precedence  parsing  algor- 
ithm.  He  associates  an  error  recovery  procedure  with  the  simple  prece- 
dence parser  which  attempts  to  find  the  smallest  substring  containing 
the  error  which  can  be  reduced.   Although  it  is  not  clear  from  the  material 
which  was  available  at  the  time  of  this  writing,  this  appears  to  be  a 
more  sophisticated  version  of  the  commonly  used  concept  of  throwing  away 
symbols  until  something  recognizable  is  found.   He  further  says  that  when 
two  or  more  syntactically  valid  recovery  actions  can  occur,  more  text  is 
examined  to  make  the  decision  as  to  which  to  choose.  He  concludes  with  a 
discussion  of  the  extension  of  his  method  to  the  class  of  LR(k)  languages. 

Other  compiler  building  systems  leave  the  error  recovery  either 
partially  or  entirely  up  to  the  particular  language  designer.   For  example, 
McKeeman  (72)  expects  the  user  to  initialize  an  array  called  STOPIT  with 
an  appropriate  set  of  terminal  symbols.   When  an  error  is  found,  an  XPL 
compiler  skips  symbols  until  it  comes  to  one  of  the  symbols  in  STOPIT.   It 
then  makes  a  reduction  appropriate  for  the  symbol  at  which  the  skipping 
stopped,  i.e.  it  finds  a  reduction  which  can  be  followed  by  that  symbol 
or  can  include  it  as  the  last  symbol.   Simpson  (88)  also  skips  to  an  appro- 
priate symbol. 
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Wirth  (98)  discusses  the  handling  of  syntactic  errors  in 
PL360  from  the  point  of  view  of  a  precedence  relation  parsing  algor- 
ithm and  hence  the  method  used  is  not  limited  to  PL360.   However  the 
method  he  outlines  is  based  on  information  provided  by  the  language 
designer.   When  no  precedence  relation  is  defined  "between  two  symbols,  an 
error  is  detected.   A  set  of  symbols,  which  the  language  designer  has 
provided,  is  searched  for  one  for  which  there  exists  a  precedence  relation 
between  it  and  each  of  the  two  symbols  which  had  no  relation  between 
them.   This  symbol  is  then  inserted  between  the  two  symbols.   If  no 
symbol  is  found,  a  table  of  erroneous  productions,  again  provided  by  the 
language  designer,  is  searched  for  one  that  applies.   Associated  with  it 
is  an  appropriate  message  provided  by  the  language  designer.   Nothing 
was  said  about  what  happens  when  no  production  from  the  table  applies. 

META-PI  (76),  a  top-down  interactive  compiler-compiler,  allows 
a  CLAMP  operation  to  be  used  in  the  syntax.   This  operation  disallows 
backtracking  to  go  past  it.   Any  need  to  backtrack  past  it  is  considered 
an  error.   It  is  not  clear  what  is  done  when  an  error  is  detected.   A 
later  chapter  will  discuss  an  algorithm  for  automatically  achieving  the 
effect  of  the  CLAMP  operation  in  top-down  compilers. 

Some  specific  compilers  and  some  special  topics  can  be  noted 
for  contributions  to  syntactic  error  recovery  even  though  they  have  been 
designed  for  specific  systems.   One  of  these  is  the  use  of  spelling  correction 
described  by  Morgan  (7^)-   He  describes  the  use  of  a  spelling  correction 
algorithm,  based  on  work  by  Damerau  (2U)  and  Freeman  (U5),  to  correct 
simple  spelling  errors  in  language  processors  and  operating  systems.   The 
corrections  were  limited  to  a  single  letter  change,  a  single  letter  inser- 


tion,  a  single  letter  deletion,  or  an  interchange  of  two  adjacent  letters. 
These  changes  were  ahle  to  correct  over  80%  of  the  spelling  errors  in 
several  hundred  programs  studied  by  Morgan  at  Cornell.  No  attempt  was 
made  to  fix  the  somewhat  less  than  20$  remaining.  This  algorithm  could 
reasonably  be  applied  at  some  points  in  the  algorithms  discussed  later. 

The  Illinois  ALCOR  7090  ALGOL  compiler  (50)  is  worthy  of  mention 
because  of  the  thought  the  designers  gave  to  its  error  recovery  methods. 
Its  performance  will  be  seen  later  as  it  is  used  as  a  comparison  with  a 
compiler  produced  by  the  system  described  below. 

WATFOR  (23,  87)  and  DITRAN  (75)  are  both  FORTRAN-based  compilers 
which  have  been  designed  with  the  idea  of  improving  the  error  detection 
and  diagnosis  capabilities  at  all  of  the  levels  3,  h,   and  5  mentioned 
earlier.   The  goal  is  to  achieve  rapid  turnaround  and  throughout  for 
student  type  programming  and  to  give  as  many  error  detection  aids  as  possi 
ble,  described  as  clearly  as  possible.   Both  seem  to  have  achieved  their 
goal  reasonably  well. 

Evans  (3*0  describes  a  Floyd  production  language  compiler  for 
ALGOL  60  in  which  there  is  a  fairly  complete  set  of  error  recovery  pro- 
ductions . 

Chapter  fifteen  of  Gries  (k9)    gives  some  good  principles  for 
error  recovery  in  several  parsing  algorithms.   These  are  aimed  toward 
helping  in  the  construction  of  specific  compilers  as  opposed  to  general 
aut  omat  i  c  t e  chni  que s . 

Discussion  of  error  recovery  techniques  is  either  absent  or 
incomplete  in  most  other  compiler  and  system  descriptions.  However, 
their  references  are  included  here  to  provide  greater  documentation  of 
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the  compiler  building  field  for  subsequent  readers.   Other  work  in  the 
area  of  translator  "writing  systems,  parser  generators,  and  recognition 
algorithms  includes  the  work  of  Brooker  and  Morris  (12,  13,  lh,   15,  16, 
17,  8l)  ,  Cheatham  (19),  Schneider  and  Johnson  (85),  Backes  (3),  DeRemer 
(26,  27),  Earley  (28,  29,  30),  Eickel  (31,  32),  Ferentzy  and  Gabura  (39), 
Floyd  (k0t    klt    U2,  kk)  ,   Ingerman  (53),  Korenjak  (63),  Ungar  (91),  and 
Warshall  (95).   Some  other  descriptions  of  compilers  for  specific  lan- 
guages can  be  found  in  Irons  (5*0,  Kanner,  et  al  (6l),  Randell  and  Russel 
(79),  Resnick  and  Sable  (  80)  ,  Uracheva  (92),  and  Wirth  and  Weber  (99). 
Syntactic  and  semantic  specification  are  discussed  in  Feldman  (36,  37), 
Ledgard  (67),  Machado  ( 71) ,  Schorre  (86),  and  Whitney  (96).   Some  special 
topics  relating  to  parsing  and  compilers  can  be  found  in  Cohen  and 
Nguyen-Dinh  (2l),  Gries  (48),  Kahan  and  Dumas -Primbault  ( 6l) ,  Rosenberg 
(82),  and  Samelson  and  Bauer  (84),  and  articles  of  a  survey  or  general 
comment  nature  are  Cheatham  and  Sattley  (20),  Feldman  and  Gries  (38), 
Floyd  (1*3),  Irons  (56),  Lomet  ( 70)  ,  and  Samelson  (83). 

The  main  point  to  be  derived  from  all  of  these  references  is  that 
there  has  been  much  ad  hoc  work  on  syntactic  error  recovery,  some  not  very 
effective,  but  surprisingly  little  of  an  automatic  or  algorithmic  nature. 

1. 3  Philosophy  of  This  Work 

Since  the  programmer  was  actually  trying  to  produce  a  string  of 
symbols  which  would  constitute  a  legal  program,  the  strings  of  symbols 
given  to  a  compiler  will  ordinarily  differ  only  slightly  from  legal 
strings.   Hence  the  error  recovery  need  only  concern  itself  with  the  set 
of  strings  which  differ  in  only  minor  ways  from  a  valid  program  and  not 
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with  the  entire  set  of  possible  strings.   In  the  method  described  below, 
small  perturbations  of  the  string  in  the  vicinity  of  the  error  are 
considered  in  an  attempt  to  find  one  which  is  syntactically  correct. 
However,  provision  must  be  made  for  some  recovery  technique  for  cases 
in  which  the  string  is  further  from  being  a  valid  string  than  any  of 
the  attempted  perturbations.   The  success  achieved  below  illustrates 
the  validity  of  this  approach. 

There  has  been  discussion  about  the  validity  of  a  compiler 
attempting  to  "fix"  errors.   The  advocates  say  that  as  long  as  the  com- 
piler can  make  some  sense  out  of  the  program,  it  should  go  ahead  and 
produce  something  and  run  it  in  order  to  let  the  programmer  find  as  many 
errors  from  all  levels  as  possible  from  each  run.  The  opponents  say 
that  the  programmer  will  come  to  depend  on  the  compiler  to  fix  his  errors 
and  hence  tend  to  become  sloppy  in  his  programming  or  that  an  incorrect 
program  will  be  compiled  and  run  without  any  indication  that  there  is  an 
error  in  it.   The  error  recovery  system  described  in  this  thesis  "corrects" 
errors  but  is  not  affected  by  the  points  of  the  above  discussion  for  two 
reasons.   First,  it  is  part  of  a  compiler  building  system  and  the  decision 
about  whether  to  have  the  compiler  attempt  to  run  a  program  which  had 
syntactic  errors  or  not  is  up  to  the  language  designer.   In  all  cases  in 
which  the  system  has  actually  been  used,  the  compilers  would  not  run  pro- 
grams with  syntactic  errors.   Second,  the  error  correction  arises  out  of 
a  different  motivation,  that  of  an  attempt  to  be  as  clear  to  the  programmer 
as  possible  as  to  the  nature  of  his  error  and  to  enable  the  compiler  to 
continue  parsing  with  a  minimum  loss  in  continuity  with  the  goal  that  all 
the  syntactic  errors  would  be  detected  on  the  first  run  and  be  accurately 


described  to  the  programmer.   The  compiler's  attempt  to  fix  the  error 
is  the  result  of  its  attempt  to  analyze  the  error  situation  to  such  an 
extent  that  it  is  able  to  tell  the  programmer  exactly  what  his  mistake 
was  and  to  direct  the  parser  on  the  most  desirable  parsing  path. 

l.k     Measurement  of  Effectiveness  of  Error  Recovery 

What  is  needed  is  a  method  of  measuring  the  effectiveness  of 
error  recovery  which  gives  a  numerical  basis  to  the  intuitive  notions 
of  more  and  less  effective  error  recovery.   In  order  to  arrive  at  such 
a  measure  of  the  effectiveness  of  error  recovery,  we  must  begin  with 
the  goals  of  error  recovery.   One  is  to  get  the  compiler  parsing  again. 
This  is  actually  part  of  the  solution  of  the  other  goals  and  so  can  be 
ignored  for  the  moment.   The  ultimate  goal  of  error  recovery  is  to  mini- 
mize the  programmer's  effort  in  getting  a  syntactically  correct  program. 
This  effort  is  determined  by  the  number  of  runs  he  is  required  to  make 
and  the  clarity  of  the  error  messages  or  time  and  energy  spent  in  under- 
standing the  error  messages  he  receives.   Although  the  clarity  of  the 
messages  is  somewhat  subjective  since  it  depends  to  some  extent  upon  the 
programmer,  the  number  of  runs  is  determined  by  two  things,  the  number 
of  errors  missed  and  the  number  of  extraneous  error  messages.   The  latter 
have  the  same  effect  as  missed  errors  when  they  occur  in  large  numbers 
since  the  programmer  gives  up  looking  through  the  extraneous  messages 
trying  to  find  the  correct  ones.   The  errors  missed  are  the  result  of  the 
recovery  having  to  skip  symbols  to  recover,  which  in  turn  is  largely 
the  result  of  incorrect  or  poor  recoveries.   The  extraneous  errors  are 
also  the  result  of  incorrect  or  poor  recoveries.   The  clarity  of  the 
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messages  could  be  determined  by  the  way  they  are  presented  or  by  their 
content  but  it  could  also  depend  on  the  accuracy  of  the  recovery  and  to 
some  extent  on  the  confusion  provided  by  the  extraneous  error  messages. 
Figure  1  illustrates  this  interdependence  of  variables,  where  each 
higher  item  is  influenced  by  those  lower  ones  which  have  arrows  pointing 
to  it. 

In  order  to  measure  the  effectiveness  of  the  error  recovery 
for  a  particular  compiler  on  a  given  set  of  runs  of  one  or  more  programs, 
in  the  examples  described  later  the  following  quantities  which  could  be 
obtained  easily  were  used:   M  -  the  number  of  errors  missed  in  all  runs, 
N  -  the  number  of  errors  detected,  and  X  -  the  number  of  extraneous 
errors  produced  which  were  the  result  of  the  recovery  and  not  the  result 
of  any  direct  action  on  the  programmer's  part.   Further,  the  recoveries 
from  the  errors  detected  were  rated  on  a  U-level  scale:   E  -  excellent 
(the  recovery  was  able  to  identify  the  string  the  programmer  actually 
intended  and  told  him  precisely  what  was  wrong),  G  -  good  (the  recovery 
was  not  exact  but  close  enough  that  the  programmer  could  tell  almost  as 
easily  as  in  the  former  case  what  the  error  actually  was),  F  -  fair 
(the  recovery  told  the  programmer  where  there  was  an  error  but  what  it 
said  about  it  did  not  help  him  much  in  identifying  the  actual  error), 
P  -  poor  (the  recovery  could  easily  have  misled  the  programmer  as  to  the 
cause  of  the  error,  making  him  choose  the  wrong  correction  or  look  in 
the  wrong  place).   The  following  formula,  used  to  measure  the  effective- 
ness of  the  error  recovery  of  a  compiler  for  a  given  set  of  runs,  has 
the  value  0  for  a  set  of  recoveries  in  which  all  errors  were  missed  and 
the  value  1  for  a  set  of  recoveries  in  which  all  errors  were  detected 
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Programmer's  Effort  to  Correct  Syntactic  Errors 


Number  of  runs  required 


Clarity  of  error  descriptions 


Number  of  extra  errors 


Number  of  errors  missed 


Number  of  incorrect  recoveries 


Effectiveness  of  compiler's  syntactic  error  recovery 


Interdependence  of  Error  Recovery  Variables 
Figure  1 
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with  a  rating  of  "E"  and  no  extras  were  created.  The  measure  is  propor- 
tionately reduced  by  extra  errors  and  by. poorer  recoveries. 


effectiveness  =   E  +  3AG  +  ^2F  +  ^   .   N 
effectiveness  R  +  M  N  +  x 


where  E,G,F,  and  P  are  the  numbers  of  error  recoveries  with  ratings  of 
"E",  "G",  "F",  and  "P"  respectively. 

This  formula  gives  a  measure  of  the  effectiveness  of  error  recovei 
which  corresponds  to  intuitive  notions  of  effectiveness  and  is  based  on 
numerical  quantities. 

1. 5   Overview  of  the  Thesis 

The  structure  of  the  rest  of  this  thesis  is  as  follows: 

Chapter  2  describes  the  compiler  building  system  consisting 
of  the  syntax  description  language,  TWINKLE,  the  TWINKLE  compiler,  the 
modified  Floyd  production  language,  the  conversion  algorithm  from 
Backus-Naur  form  productions  which  are  the  output  of  the  TWINKLE  compiler, 
the  operation  of  the  parser,  and  the  optimization  of  the  parsing  tables. 

Chapter  3  then  discusses  the  error  recovery  operation  in  the 
parser  and  the  changes  made  to  the  parsing  table  by  the  preprocessor  for 
the  error  recovery. 

The  DEMALGOL  test  compiler  is  described  in  chapter  k   and  the 
results  of  its  error  recovery  on  actual  programs  are  compared  with  the 
Burroughs  ALGOL  compiler  and  the  ALCOR  Illinois  7090  ALGOL  compiler. 

Chapter  5  indicates  the  results  three  other  compilers  imple- 
mented on  this  system.   For  one  of  these,  there  was  already  another 


13 
compiler  with  which  it  could  be  compared.   The  results  of  these  four 
compilers  demonstrate  that  not  only  is  automatic  error  recovery  based 
on  the  syntax  of  the  language  possible  but  that  it  can  come  reasonably 
close  to  the  ultimate  goal  of  finding  all  the  syntactic  errors  in  the 
program  and  describing  each  one  accurately  and  clearly. 

Chapter  6  summarizes  some  observations,  based  on  all  the  examples 
that  were  run,  that  suggest  language  design  considerations  which  improve 
error  recovery  in  this  system. 

Chapter  7  discusses  the  changes  that  are  necessary  to  extend 
this  method  of  automatic  error  recovery  to  recursive  descent  parsing. 
The  primary  factor  here  is  the  detection  of  the  error.   A  compiler  for 
the  DEMALGOL  language  was  built  with  a  recursive  descent  compiler- 
duilding  system  and  modified  according  to  the  algorithms  presented  in 
chapter  J. 

This  compiler  and  the  results  for  it  using  the  same  programs 
as  were  used  for  the  compiler  described  in  chapter  k  are  described  in 
chapter  8. 

Additional  language  design  considerations  for  recursive  descent 
parsing  are  given  in  chapter  9,  and  the  final  chapter  gives  some  appli- 
cations and  further  research  which  can  be  built  on  the  work  reported 
here. 


CHAPTER  2 .   THE  TRANSLATOR  WRITING  SYSTEM 

2.1  Overall  System  Description 

The  translator  writing  system  in  which  this  system  of  syntactic 
error  recovery  has  been  developed  consists  of  five  programs  and  one  file 
of  procedures  written  in  Burroughs  extended  ALGOL  for  the  B5500.  These 
components  are : 

1.  TWINKLE/DISK.   A  compiler  for  the  syntax  description  lan- 
guage TWINKLE  which  produces  a  Backus  Naur  form  description 
of  the  language  being  processed. 

2.  BNF2FPL/TWS.   A  program  which  converts  the  Backus  Naur 
form  productions  into  modified  Floyd  productions. 

3.  FPL2PAR/TWS.  A  program  which  generates  parsing  tables 
from  the  Floyd  productions. 

k.      PAR2ALG/TWS.  A  program  which  generates  B5500  ALGOL  source 
code  from  the  parsing  tables. 

5.  ISL/DISK.   A  compiler  which  translates  the  language  seman- 
tics description  written  in  ISL  into  B5500  ALGOL  source  code. 

6.  TWS/FILES.   A  file  of  B5500  ALGOL  source  code  containing 
the  scanning  and  parsing  procedures. 

These  components  will  be  discussed  in  separate  sections  below 
except  that  ISL/DISK  will  not  be  discussed  since  this  thesis  is  concerned 
only  with  syntactic  matters.   Also  the  parsing  procedures  in  TWS/FILES 
will  be  discussed  along  with  the  parsing  table  generation  program 
FPL2PAR/TWS . 
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The  error  recovery  techniques  which  are  the  contribution  of 

this  thesis  are  described  in  chapter  3  but  involve  components  2,  3,  and 

6  above . 

A  discussion  of  the  structure  of  this  system,  the  semantics 

language  ISL,  and  the  structure  of  compilers  built  with  this  system  can 

be  found  in  Machado  (71). 

2.2  The  TWINKLE  Syntax  Description  Language  and  Compiler 

The  TWINKLE  langauge  is  based  on  Backus  Naur  form,  but  it 
extends  Backus  Naur  form  in  two  different  ways.   First,  it  has  some 
additional  forms  of  expression  which  simplify  the  definition  of  several 
language  structures,  such  as  lists  and  sets  of  symbols.   An  example  of 
a  list  is  the  < compound  tail>  in  ALGOL  which  is  a  list  of  < statements 
separated  by  ";  ".   An  example  of  a  symbol  set  is  "any  character  except 
;  ".   Second,  it  allows  the  language  description  to  be  English-like  so 
that  the  syntactic  description  of  the  language  which  is  used  to  construct 
the  compiler  can  also  be  given  to  users  of  the  language  to  explain  the 
syntactic  structure  of  the  language.   For  example,  the  two  examples 
given  above  could  be  stated,  "A  < COMPOUND  TAIL>  CONSISTS  OF  A  LIST  OF 
<STATEMENT>S  SEPARATED  BY  SEMICOLONS"  and  "ANY  CHAPACTER  BUT  SEMICOLON". 
The  English  form  is  not  required,  and  some  very  compact  symbolic  nota- 
tions are  also  allowed. 

The  TWINKLE  compiler  produces  a  Backus  Naur  form  description 
of  the  language,  inserting  dummy  productions  where  necessary,  in  such  a 
way  as  to  enhance  the  grammar  for  the  Backus  Naur  form  to  modified  Floyd 
production  language  conversion  algorithm.   A  complete  description  of  the 
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TWINKLE  language  and  the  TWINKLE  to  Backus  Naur  form  conversion  can  be 
found  in  Mercer  (73). 

2. 3  Conversion  of  the  Backus  Naur  Form  of  the  Language  to  Modified 
Floyd  Production  Language 

The  next  step  in  the  syntax  preprocessing  is  the  construction 
of  modified  Floyd  productions  from  the  Backus  Naur  form  (BNF)  of  the 
language.   These  productions  are  arranged  in  groups  which  correspond 
to  specific  situations  in  the  BNF  grammar.   This  section  describes 
these  Floyd  productions,  the  significance  of  the  grouping  of  the  Floyd 
productions,  and  the  generation  of  the  Floyd  productions  from  the  BNF 
grammar . 

Each  of  these  modified  Floyd  productions  consists  of  four 
parts:   (l)  a  test  of  the  current  symbol,  i.e.  either  the  current 
symbol  from  the  input  source  program  or  a  nonterminal  to  which  a  reduction 
was  just  made,  (2)  a  forward  context  test  of  the  next  few  symbols  of  the 
input  source  program  beyond  the  current  symbol,  (3)  a  set  of  semantic 
actions,  and  (h)    a  set  of  parser  actions.   Any  one  of  these  parts  except 
the  parser  action  part  may  be  empty.   The  parser  action  part  consists  of 
four  different  types  of  actions:   (l)  whether  or  not  the  stack  that  is 
maintained  for  semantic  purposes  is  reduced,  (2)  whether  or  not  a  new 
symbol  from  the  input  source  program  is  put  on  the  top  of  the  semantics 
stack,  (3)  what  action  is  taken  with  regard  to  a  marker  stack  if  any, 
i.e.  whether  one  or  more  markers  are  popped  or  a  new  marker  is  put  on  the 
top  or  neither,  and  (k)   which  is  the  next  group  of  productions  to  be 
attempted.   The  significance  of  the  markers  will  be  indicated  below. 
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These  modified  Floyd  productions  are  arranged  in  groups 

some  of  which  have  an  error  production  at  the  end.   Testing  usually 

begins  with  the  first  production  in  the  group.   Figure  2  illustrates 

a  typical  Floyd  production  group.   ou  and  a     are  the  current  symbol  or 

stack  tests,  $   and  39  are  the  lookahead  tests,  ->N .  means  reduce  the 

parsing  or  semantics  stack  to  nonterminal  N.,  -«-N  '  means  push  marker 

N1  onto  the  marker  stack,  *  means  take  a  new  symbol  from  the  input 
m 

source  program  and  put  it  at  the  top  of  the  semantics  stack,  and  the 

G,  ,  G  ,  and  G  are  the  names  of  the  next  groups  to  be  executed, 
k   n       p  . 


G.  :    a,  ;  3n  •>  N.  G 

1     1  '   1    j  k 

an  !  30  +   N  *  *  G 

1  '   2    m  n 

a2  |  *  Gp 

ERROR 


Example  of  a  Floyd  Production  Group 
Figure  2 

Four  basic  types  of  Floyd  production  groups  are  generated 
corresponding  to  four  different  situations  in  the  BNF  grammar.   To  see 
how  these  groups  are  determined,  consider  Figures  3  and  k.      Figure  3 
gives  the  BNF  productions  for  a  small  language  and  Figure  k   gives  the 
Floyd  productions  produced  for  this  language. 


18 


A  ->  CB 

B  ■*•  Bd 

B  ->-  DE 

C  -*■  F 

F  -*■  f 

D  ■*  Fg 

E  ->  e 


A  Small  BNF  Grammar 
Figure  3 


TH-A:      f |  +  F 

TH-B:      f  |  -*-  F 

TH-E:      e|  ->  E 

NTH-A:      F|  +   C 

C|  «-  B'    TH-B 
NTH-B:      F|  TPN(6,2) 

D|  «-  E1    TH-E 
NTPN(l,2)    -B:      B|d  TPN(2,2) 

B|  ->•  A 

E|  -*  B 

d|  ■>  B 

g|  •*■  D 


NTPN(3,2)  -E 
TPN(2,2) 
TPN(6,2) 


Modified  Floyd  Productions  of  Grammar  in  Figure  3 

Figure  k 
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There  is  one  group  for  each  terminal  symbol  in  a  non-first 
position  called  TPN  for  "terminal  as  the  Nth  symbol  (N  •*■   l)  of  production 
P"  and  one  group  for  each  nonterminal  symbol  in  a  non-first  position 
called  NTPN  for  "nonterminal  as  the  Nth  symbol  (N  -»■  l)  of  production 
P".   Each  nonterminal  symbol  which  has  a  non-first  occurrence  has  a 
TH  group  (for  "terminal  head  symbol")  containing  all  the  first  position 
terminal  symbols  which  are  in  the  first  position  on  some  derivation  for 
this  nonterminal  and  an  NTH  group  (for  "nonterminal  head  symbol")  con- 
taining all  the  first  position  nonterminal  symbols  which  are  in  the  first 
position  on  some  derivation  for  this  nonterminal.   Figures  3  and  k   give 
examples  of  all  four  of  these  kinds  of  groups . 

The  significance  of  these  groups  in  the  parser  is  as  follows: 
Whenever  the  portion  of  a  production  to  the  left  of  a  terminal  symbol  is 
identified,  the  TPN  group  is  executed.   (Since  this  group  has  just  one 
production,  it  is  put  in  as  a  continuation  of  the  current  Floyd  pro- 
duction.)  If  the  next  symbol  is  a  nonterminal  symbol,  a  marker  is  put  on 
the  marker  stack  indicating  the  nonterminal,  the  NTPN  group  to  apply 
when  this  nonterminal  is  recognized,  and  the  NTH  group  for  this  nonterminal, 
Control  is  then  transferred  to  the  TH  group  for  this  nonterminal.   When 
the  symbol  on  the  right  end  of  a  BNF  production  is  recognized,  the  next 
Floyd  production  is  determined  by  comparing  the  name  on  the  left  hand  side 
of  the  BNF  production  with  the  name  of  the  top  marker  in  the  marker  stack. 
If  the  names  are  the  same,  the  NTPN  group  indicated  by  the  marker  is  the 
next  group  executed;  otherwise,  the  NTH  group  indicated  by  the  marker  is 
the  one  executed. 
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Preclusion  conflicts  between  Floyd  productions  are  resolved 
by  forward  contextual  analysis  or  "by  the  formation  of  new  groups  which 
are  the  combination  of  the  next  groups  for  each  production  in  conflict. 
This  causes  there  to  be  combined  markers  and  CTPN,  CNTPN,  CTH,  and  CNTH 
groups  in  addition  to  the  groups  mentioned  above. 

In  some  cases,  the  grammar  is  ambiguous  because  the  resolu- 
tion of  the  conflict  depends  on  semantic  information;  for  example, 
<identifier>  may  be  a  <Boolean  primary>  or  an  <arithmetic  primary> 
depending  on  information  stored  in  an  identifier  table  in  the  semantics. 
A  semantic  test  is  used  in  this  case  to  resolve  the  conflict.   In  the 
parser,  this  is  the  same  as  a  regular  semantic  action  call  except  that 
when  control  returns  from  the  semantic  routine,  the  global  Boolean 
variable  SEMANTICTEST  is  tested.   If  this  variable  is  false,  this  pro- 
duction is  terminated  and  the  next  one  is  tried.   In  the  BNF  to  FPL 
conversion,  any  semantic  test  is  assumed  to  resolve  any  conflict  associated 
with  the  stack  symbol  preceding  the  test  in  the  BNF.   The  details  of  this 
algorithm  for  converting  BNF  to  modified  Floyd  production  language  can 
be  found  in  Beals  (7,8). 

The  program  BNF2FPL/TWS  makes  some  small  optimizations  to  the 
set  of  Floyd  productions  it  generates.   One  of  these  is  the  putting  of  a 
TPN  production  in  sequence  after  the  production  which  calls  it,  as  has 
been  mentioned  above.   Another  is  the  using  of  just  one  group  for  several 
which  are  identical.   This  helps  reduce  the  number  of  Floyd  productions 
generated  and  sometimes  leads  to  other  subsequent  optimizations. 

Also,  a  table  of  following  symbols  is  constructed.   This  table 
will  be  referred  to  below  as  the  error  situations  in  which  it  is  used 
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the  table   to  the  next  production  to  "be  applied.   If  the  instruction 
was  either  the  XSIS  or  XSIB  instruction  of  Figure  6,  an  error  proce- 
dure is  called  to  insert  the  missing  symbol  and  the  next  part  of  this 
production  is  applied.   The  <none>  stack  tests  are  always  successful. 
SKIP  is  used  in  those  groups  which  have  stack  tests,  namely  TH,  NTH, 
CTH,  CNTH,  and  CTPN.   NONE  is  used  in  groups  without  stack  tests,  NTPN 
and  CNTPN.   ILVL  is  used  in  TPN  productions.   FDNT  is  used  on  the  first 
nonrecursive  production  in  an  NTPN  or  CNTPN  group,  and  CNONE  is  used  in 
CNTPN  groups  following  the  production  with  FDNT.   The  purpose  of  all 
these  different  null  stack  tests  is  to  allow  TPN  productions  to  be  skipped 
over  after  a  false  lookahead  or  semantic  test  and  to  set  flags  and  to  be 
markers  for  the  error  recovery  mechanism. 

If  a  lookahead  test  XLS,  XLB ,  or  LK  fails,  then  the  next  non- 
TPN  production  in  sequence  is  applied.   This  production  will  have  a 
<none>  stack  test  so  another  lookahead  test  will  be  applied.   If  a 
lookahead  test  XLRR  or  XLNR  fails,  then  an  error  procedure  is  called 
to  recover  from  the  error  in  the  next  symbol.   This  procedure  will  reset 
the  production  pointer  to  the  appropriate  production  following  the  recovery 
measures . 

If  a  semantic  test  fails,  the  production  pointer  is  incremented 
to  point  to  the  next  production  unless  this  production  is  the  last  pro- 
duction for  this  stack  test  symbol.   In  this  case,  an  error  message  is 
printed  and  the  test  is  considered  true. 

The  parser  action  section  consists  of  four  possible  separate 
actions.   A  number  of  markers  may  be  popped  from  the  marker  stack;  a  new 
marker  may  be  pushed  onto  the  marker  stack;  the  semantics  stack  may  be 
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are  discussed.   This  table  contains  a  bit  row  for  each  marker  symbol 
and  each  TPN  production.  Each  row  has  marked  all  the  terminal  symbols 
vhich  can  immediately  follow  the  symbol  which  references  that  row. 

At  the  conclusion  of  BNF2FPL/TWS ,  a  file  of  Floyd  production 
groups  has  been  created  which  corresponds  to  the  BNF  grammar  for  the 
language.   Also  created  is  a  table  of  following  symbols  which  will  be 
used  by  the  error  recovery  procedures  as  described  in  Chapter  3. 

2.k     Construction  and  Use  of  Parser  Instruction  Tables 

The  completed  file  of  modified  Floyd  productions  is  then  con- 
verted into  a  table  of  parser  instructions  by  the  program  FPL2PAR/TWS. 
The  format  of  this  table  is  given  in  Figure  5-   Also  at  this  time,  four 
other  tables  are  created  or  added  to.   Tests  for  lookaheads  of  length 
greater  than  one  are  put  in  one  table;  multiple  semantic  action  calls 
are  put  in  another;  the  combined  markers  are  put  in  a  third;  and  additions 
are  made  to  the  symbol  set  table.   The  meaning  and  use  of  these  parser 
instructions  and  tables  are  discussed  briefly  in  this  section. 

The  syntax  for  the  language  of  the  parser  is  given  in  Figure  6. 
Except  for  the  error  production  or  finis  production,  each  production  has 
four  parts:   a  stack  test,  a  lookahead  test,  semantic  actions,  and  parser 
actions.   A  successful  test  in  each  part  causes  the  parser  to  move  to 
the  next  part. 

If  the  stack  test  fails,  then  one  of  two  actions  is  taken 
depending  on  which  parser  instruction  made  the  stack  test.   If  the  instruc- 
tion was  either  XSBS  or  XSBB  instruction  of  Figure  6,  then  the  <failure 
skip  number>  indicated  in  Figure  6  is  used  to  determine  the  distance  down 
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<modif ied  Floyd  production>  : :  =  <comparison  production>  / 

<error  production>  /  <finis  production> 
<comparison  production>  : :  =  <stack  comparison>  <lookahead  comparison 

<semantic  actions>  <parser  actions>. 
<stack  comparison>  : :  =  <none>  /  <symbol  test>  /  <symbol  set  test>  / 

<unique  symbol  test>  /  <unique  symbol  set  test> 
<none>  : : =  NONE  /  CNONE  /  SKIP  /  ILVL  /  FDNT 
<symbol  test>  :  :  =  XSBS  <symbol>  <failure  skip  number> 
<symbol  set  test>  : :  =  XSBB  <symbol  set>  <failure  skip  number> 
<unique  symbol  test>  :  :  =  XSIS  <terminal  symbol>  <following  symbol  set> 
<unique  symbol  set  test>  : :  =  XSIB  <symbol  set>  <following  symbol  set> 
<symbol>  : : =  <terminal  symbol>  /  <nonterminal  symbol> 
<lookahead  comparison>  : :  =  <lookahead  symbol  test>  /  <lookahead  symbol  set  test>  / 

<multisymbol  lookahead  test>  /  <lookahead  error>  /  empty 
<lookahead  symbol  test>  :  :  =  XLS  <terminal  symbol> 
<lookahead  symbol  set  test>  : :  =  XLB  <symbol  set> 

<multisymbol  lookahead  test>  : :  =  LK  <row  in  multisymbol  lookahead  table> 
<lookahead  error>  : :  =  XLRR  <following  symbol  set>  /  XLNR  <following  symbol  set> 
<semantic  actions>  :  :  =  ACT  <action  number>  /  TEST  <action  number  >  / 

MANY  <rov  in  action  sequence  table>  <number  of  actions  in  the  row>  /  empty 
<parser  actions>  :  :  =  <pop  markers>  <reduce  stack>  <push  marker>  <transfer> 


<pop  markers>  : 
<reduce  stack> 
<push  marker>  : 


=  POP  <number  of  markers  to  be  popped>  /  empty 
: =  RED  /  empty 
=  PUSH  <marker>  /  empty 
<transfer>  ::=  DNT  Nonterminal  symbol>  /  SCAN  /  GO  <address>  /  SCAN  GO  <address> 
<marker>  : : =  <nonterminal  symbol>  <match  address>  <non  match  address> 
<following  symbol  set>  /  <row  in  combined  marker  table> 
<number  of  markers  in  the  row>  <non  match  address> 
<error  production>  :  :  =  ERRN  Nonterminal  symbol>  <address>  /  ERRR  /  ERNR 
<finis  product ion>  :  :  =  EXIT 

Syntax  of  the  Parser  Language 
Figure  6 
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reduced;  and  a  new  address  in  the  production  table  is  taken  in  one  of 
four  ways:   1.   If  a  reduction  to  a  nonterminal  is  made,  then  the 
address  is  taken  from  the  marker  symbol  according  to  whether  or  not  the 
nonterminal  matches  the  marker  name;  2.   A  new  input  symbol  is  taken 
and  the  next  production  in  sequence  is  applied;  3.   A  next  address  is 
given  for  the  next  production;  h.     A  new  input  symbol  is  taken  and  a 
next  address  is  given  for  the  next  production.   If  transfer  case  2. 
occurs,  the  next  production  is  a  TPN  production  and  is  considered  to  be 
part  of  this  one.   Its  stack  test  will  be  XSIS,  XSIB  or  ILVL  and  these 
cases  will  be  skipped  over  in  the  XLS,  XLB,  LK,  and  semantic  test  fail- 
ures mentioned  above. 

This  section  has  given  a  basic  description  of  the  modified 
Floyd  production  language  as  implemented  in  the  parser,  the  parsing 
operators,  and  how  those  operators  are  used. 

2.5  Optimization  of  the  Parser  Instruction  Table 

The  parser  represented  by  the  parsing  table  is  considerably 
optimized  over  the  set  of  Floyd  productions  produced  by  the  conversion 
from  BNF.   This  section  gives  four  optimizations  which  FPL2PAR/TWS 
makes  to  the  parsing  table.   These  optimizations  are:   1.   the  combining 
of  single  symbol  tests  into  symbol  set  tests  where  possible;  2.   using  one 
group  of  Floyd  productions  for  two  or  more  which  are  identical;  3.   elimin- 
ating unnecessary  markers  and  stack  reductions;  and  h.      introducing  direct 
transfers  within  NTH  groups. 

Wherever  possible  in  stack  tests  or  lookahead  tests,  sequential 
tests  are  combined  into  single  set  membership  tests ,  sometimes  creating 
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new  symbol  sets  which  are  added  to  the  symbol  set  table.   Lookahead 
tests  of  one  symbol  can  always  be  combined  into  one  symbol  set  test  as 
can  the  nth  symbol  tests  (n=2,  3,  or  k)   of  a  multiple  symbol  lookahead 
test  of  length  n.  The  stack  tests  of  two  consecutive  productions  in  a 
group  can  be  combined  into  a  symbol  set  test  if  all  the  other  corre- 
sponding parts,  lookahead  tests,  semantic  actions,  and  parser  actions, 
are  identical. 

Each  new  NTPN  and  CNTPN  group  is  compared  with  the  previous 
ones  for  that  nonterminal.   If  it  has  the  same  set  of  parser  instructions, 
then  that  group  is  identified  with  the  previous  one. 

After  the  table  is  initially  completed,  all  reductions  and 
markers  that  are  not  needed  syntactically  or  semantically  are  removed. 
For  example,  consider  the  BNF  syntax  of  Figure  7  and  its  Floyd  pro- 
duction counterpart  in  Figure  8.   First  the  tests  for  e  and  f  can  be 
combined  so  group  TH-D  of  Figure  7  becomes  "Th-D:   {e,f}  -*•  D."  Then 
groups  NTPN-D  and  NTPN-D  are  the  same  so  NTPN-D  can  serve  for  both. 
Since  B  and  C  each  have  only  one  NTPN  group  which  has  a  reduction  as 
parser  action  and  contains  no  lookahead  test  or  semantic  action,  and  they 
do  not  occur  in  an  NTH  stack  test,  their  markers  are  not  needed.   Also 
since  B  and  C  each  have  only  one  NTPN  group  and  do  not  occur  in  an  NTH 
group  and  since  that  one  production  in  each  NTPN  group  contains  no 
lookahead  test  or  semantic  action,  the  reductions  to  B  and  C  can  be  re- 
placed with  the  parser  actions  of  each  of  their  respective  NTPN  productions 
and  the  NTPN  groups  eliminated.   After  eliminating  B  and  C  in  order,  we 
have  the  productions  of  Figure  9-      Now  groups  NTPN-D  and  NTPN-D«  are 
identical  and  so  we  have  only  one  NTPN-D  group.   D  now  satisfies  the 
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same  criteria  above  which  B  and  C  satisfied  so  the  D  markers,  the 
reductions  to  D,  and  the  NTPN-D  group  can  "be  eliminated  leaving  the  set 
of  productions  of  Figure  10  as  the  final  set  for  the  parser.   Although 
the  system  currently  implemented  (9)  does  not  do  it,  group  TH-C  could 
become  "TH-D:   {c,d}  TH-D" .   If  the  NTPN  group,  or  production  in  an 
NTH  group  if  the  single  nonterminal  occurrence  is  in  an  NTH  group 
instead  of  an  NTPN  group,  consists  of  several  productions  distinguished 
by  lookahead  or  semantic  tests,  then  the  group  must  be  retained  but  it 
may  be  transferred  to  directly  without  reference  to  the  marker  stack. 
If  these  productions  are  in  an  NTH  group  instead  of  being  in  an  NTPN 
group,  they  are  removed  from  the  NTH  group  and  made  into  a  special  NTPN- 
like  group  to  be  transferred  to. 

After  all  of  the  above  has  been  completed,  the  reductions  which 
remain  in  an  NTH  group  are  changed  so  that  the  parser  action  just  trans- 
fers to  the  applicable  production  within  the  group  without  looking  at 
the  marker  as  long  as  the  nonterminal  named  in  the  reduction  is  not  the 
same  as  the  nonterminal  name  of  the  group  (which  would  require  the  NTPN 
group  of  the  top  marker).   Further,  if  the  new  production  has  no  lookahead 
test  or  semantic  action  and  is  not  followed  by  a  TPN  production,  then  the 
parser  action  field  of  the  new  production  is  copied  into  the  former  pro- 
duction.  As  with  the  previous  process,  this  process  is  iterative  until 
all  possible  backsubstitutions  have  been  made.   For  example,  the  group  in 
Figure  11  would  become  the  one  in  Figure  12. 

Combining  parser  actions  includes  adding  the  number  of  markers 
popped,  using  the  next  address  and  marker  or  reduction  name  of  the  second 
production,  and  oring  the  rest  of  the  parser  action  fields. 
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The  following  summarize  the  optimization  criteria: 

1.  Set  tests  can  be  used  in  the  stack  test  field  of  pro- 
ductions in  TH,  CTH,  and  CTPN  groups  as  long  as  all  three 
other  fields,  lookahead  test,  semantic  action,  and  parser 
action  are  identical  for  each  production  combined. 

2.  A  previous  NTPN  or  CNTPN  group  may  be  used  instead  of  a 
later  one  if  they  consist  of  the  same  set  of  productions. 

3.  A  reduction  may  be  replaced  with  a  direct  transfer  if 
the  nonterminal  named  by  the  reduction  occurs  only  once, 
either  as  an  NTPN  group  or  in  only  one  NTH  group.   In  the 
latter  case,  the  production  or  productions  are  removed  from 
the  NTH  group  and  made  a  separate  special  group.   Further, 
if  there  is  just  one  production,  possibly  followed  by  TPN 
productions,  and  it  has  no  lookahead  test  and  no  semantic 
actions,  then  the  parser  actions  can  be  combined  with  the 
parser  actions  of  the  production  making  the  reduction  and 
the  NTPN  or  NTH  production  removed.   (if  there  were  TPN 
productions,  they  must  be  copied  after  the  production  which 
made  the  reduction.)  This  optimization  and  the  previous 
one  interrelate  in  that  the  application  of  one  of  them  can 
create  the  conditions  for  the  other  one. 

A  marker  can  be  removed  if  the  above  condition  for 
removing  reductions  to  that  nonterminal  are  met,  the  NTPN 
production  has  no  semantic  action  and  the  parser  action  is 
a  reduction,  and  there  is  no  NTH  group  for  this  nonterminal. 
The  conditions  for  this  optimization  can  be  generated  during 
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application  of  the  previous  optimization  since  it  is 
possible  for  an  NTH  group  to  disappear  if  all  its  productions 
are  either  eliminated  or  made  into  special  groups. 
k.     Within  NTH  groups,  any  reduction  to  any  nonterminal  other 
than  the  one  for  which  this  group  was  made  must  come  hack 
to  this  group.  Hence  those  reductions  can  he  made  to  trans- 
fer directly  to  the  applicable  production.   If  the  applicable 
production  has  no  lookahead  test  or  semantic  action  and  the 
parser  action  of  the  production  making  the  reduction  can  be 
combined  with  the  parser  action  of  the  production  it  would 
have  transferred  to. 

2.6  Source  Code  Generation  for  the  Parsing  Tables 

The  program  PAR2ALG/TWS  converts  the  parsing  tables  to  Burroughs 
ALGOL  so  that  they  may  be  compiled  into  the  object  code  for  the  compiler. 
This  eliminates  the  need  to  have  the  file  of  parsing  tables  present  during 
execution  of  the  compiler.   Burroughs  Extended  ALGOL  FILL  statements  are 
used  to  implement  this. 

Also,  the  multiple  symbol  lookahead  testing  is  very  natural  to 
nested  conditional  statements  and  the  option  of  converting  this  table  to 
ALGOL  code  and  having  it  compiled  in  for  direct  execution  instead  of  being 
interpreted  is  also  available. 

2.7  Summary 

This  chapter  has  discussed  the  syntax  preprocessing  portion  of 
the  translator  writing  system  from  the  language  specification  in  TWINKLE 
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to  the  parsing  tables  or  code  generation  output.   More  detail  was  given 
on  the  Floyd  production  generation  and  the  construction  of  the  parsing 
table  as  these  are  more  basic  to  an  understanding  of  the  implementation 
of  the  error  recovery  mechanism  discussed  in  the  next  chapter. 
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CHAPTER  3-   ERROR  ANALYSIS  IN  MODIFIED  FLOYD  PRODUCTION  LANGUAGE  . 

3.1  Introduction  to  Chapter  3 

At  the  detection  of  an  error,  three  steps  are  taken:   (l)  the 
fact  that  an  error  occurred  is  described  to  the  programmer;  (2)  appro- 
priate recovery  measures  are  taken;  and  (3)  the  recovery  is  described  to 
the  programmer.   The  recovery  measures  used  are  determined  by  the  situation 
in  which  the  error  is  detected.   There  are  three  possibilities:   (l)  the 
absence  of  a  unique  required  symbol;  (2)  a  stack  test  error  in  a  group  in 
which  all  productions  have  the  same  parser  action;  and  (3)  all  other  cases 
including  lookahead  errors.  Within  each  of  these  three  situations,  several 
different  measures  are  attempted  until  a  recovery  can  be  made.  All  three 
cases  may  make  use  of  a  string  generator  procedure.   All  of  these  situa- 
tions and  the  respective  recovery  measures  are  described  in  this  chapter. 

3.2  Error  Detection 

Because  of  the  choice  of  parsing  algorithm,  the  identification 
of  syntactic  errors  is  automatic;  it  is  inherent  in  the  algorithm.  At 
the  end  of  a  group,  the  failure  of  all  the  stack  tests  means  the  symbol 
at  the  top  of  the  stack  is  one  which  is  not  legal  at  that  point,  i.e. 
it  is  an  error.   Similarly,  at  the  end  of  a  subgroup  or  series  of  lookahead 
tests,  if  all  of  the  lookahead  tests  fail,  there  is  an  error  in  the  symbols 
ahead. 

When  an  error  is  detected,  the  procedure  TSKTSK  is  called.  This 
procedure  prints  the  initial  part  of  the  error  message  as  follows :  If  the 
source  program  is  not  being  printed,  the  current  card  is  printed.   An  "X" 
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is  printed  under  the  last  character  scanned  to  indicate  approximately  where 
on  the  card  the  error  occurred.   Then  a  line  is  printed  which  is  easily 
seen  in  a  listing  by  its  having  asterisks  all  across  the  page  except  for 
words  indicating  the  error.  This  line  gives  the  ordinal  number  of  the 
error,  whether  or  not  the  error  occurred  in  a  lookahead  which  is  deter- 
mined by  the  first  parameter  passed  to  TSKTSK,  and  the  name  of  the  symbol 
the  parser  was  looking  at ,  for  example : 

*****  13TH  ERROR  IN  LOOKAHEAD  AT  IDENTIFIER  WHATISITSNAME  *************** 
A  second  Boolean  parameter  passed  to  TSKTSK  determines  whether  a  second 
line  is  printed  which  gives  the  name  of  the  marker  at  the  top  of  the 
marker  stack,  for  example: 
*****  WHILE  SEEKING  STATEMENT 

If  the  marker  happens  to  be  a  combined  marker,  then  the  names  of  all  the 
markers  combined  are  printed,  for  example: 
*****  WHILE  SEEKING  STATEMENT  OR  DECLARATION 

After  printing  these  lines,  procedure  TSKTSK  is  finished  and 
control  is  returned  to  the  calling  procedure  for  appropriate  recovery 
measures . 

3. 3  Error  Recovery  in  Case  of  Unique  Terminal  Symbols 

Recovery  from  the  error  is  based  on  which  of  several  error 
situations  is  encountered.   These  situations  are  identified  during 
the  syntax  preprocessing  and  appropriate  parser  instructions  are  placed 
in  the  parsing  table  to  signal  that  particular  error  situation. 

The  first  situation  is  that  of  unique  terminal  symbols.   These 
are  situations  in  which  there  is  only  one  stack  test.   This  is  the  case 
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with  TPN  productions  and  it  is  the  case  in  TH,  CTH,  and  CTPN  groups  in 
vhich  there  is  only  one  stack  test.  This  can  happen  in  three  ways: 
there  is  only  one  production,  there  are  several  productions  hut  all  have 
the  same  stack  test,  or  there  are  several  productions  with  different  stack 
tests  hut  they  have  all  heen  combined  into  one  production  with  a  symbol 
set  test  as  described  in  section  2.5.     The  parser  stack  test  instruction 
that  is  used  in  this  case  is  the  XSIS  instruction  for  a  single  symbol 
test  or  the  XSIB  instruction  for  a  symbol  set  test.  With  these  two  instruc 
tions,  the  failure  of  the  test  constitutes  an  error.   In  this  case,  after 
calling  TSKTSK,  the  procedure  PUTINSTACK  is  called  to  put  the  correct 
symbol  at  the  top  of  the  stack.  This  procedure  uses  a  hierarchy  of  three 
levels  of  attempt  to  insert  the  symbol.  The  first  is  to  use  the  table  of 
following  symbols,  mentioned  previously  as  being  created  at  the  time  the 
Floyd  productions  are  created,  to  determine  if  the  symbol  in  error,  or  the 
subsequent  symbol,  or  the  second  symbol  following  can  succeed  the  correct 
symbol.   This  information  is  used  to  determine  if  there  is  any  simple  way 
of  fixing  the  input  string  to  correct  the  error. 

This  attempt  at  fixing  the  input  string  is  based  on  the  table 
of  Figure  13.   The  conditions  are  tested  from  top  to  bottom  with  the  first 
one  being  applied. 

If  none  of  these  conditions  is  satisfied,  the  second  level  is 
attempted.   This  level  consists  of  calling  a  string  generator  procedure 
ERRFPINTERPRETER  which  goes  through  a  more  sophisticated  process  to 
determine  if  there  is  any  way  to  change  the  string  to  correct  the  error. 
If  there  is  such  a  way,  then  the  change  is  made  and  described  to  the 
programmer  in  the  same  manner  as  was  indicated  for  the  first  level.   Other- 
wise the  third  level  is  done. 
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3  =  the  initial  portion  of  the  source 
string 

y  =  the  terminal  portion  of  the 
source  string 

a  =  the  symbol  that  is  required 
at  the  top  of  the  stack 

b  =  any  symbol  which  can  immedi- 
ately follow  this  'a' 

c  =  any  symbol 

i  =  the  top  of  the  stack 

-*■  =  if  the  condition  on  the  left 
applies,  transform  it  to  the 
string  on  the  right 


Conditions  for  Inserting  a  Symbol 
Figure  13 


The  third  level  is  just  to  insert  the  correct  symbol  in  front 
of  the  one  which  is  there.   This  means  that  the  next  symbol  will  be  an 
error,  but  hopefully  by  being  one  symbol  further  along  in  the  parse,  either 
the  additional  symbol  of  context  will  allow  the  recovery  mechanism  to  fix 
the  string  or  one  of  the  other  error  recovery  techniques  which  are  based 
on  the  marker  stack  will  be  the  one  used.   The  change  is  described  to 
the  programmer  by  printing  two  lines,  one  as  the  string  was,  the  other 
as  the  string  was  changed,  for  example: 
*****  CHANGED:   IF  A 
*****      TO:   ;  IF  A 
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This  section  has  indicated  the  procedure  used  to  recovery 
from  errors  when  the  error  is  detected  by  an  XSIS  or  XSIB  instruction. 
The  recovery  is  made  "by  the  procedure  PUTINSTACK  which  first  tries  a 
simple  method  of  fixing  the  string  using  the  table  of  following  symbols. 
If  this  does  not  work  it  uses  the  procedure  ERRFPINTERPRETER  which  follows 
a  more  complex  method  to  determine  if  the  string  can  be  fixed  to  correct 
the  error.   If  this  method  fails  also,  the  missing  symbol  is  simply 
inserted  in  front  of  the  symbol  at  which  the  error  was  detected. 

3.^  Error  Recovery  in  the  Case  of  Same  Parser  Actions 

Sometimes  it  happens  that  all  the  productions  of  the  group 
have  the  same  parser  action  (not  counting  following  TPN  productions).   In 
this  case,  the  error  production  at  the  end  of  the  group  takes  one  of  two 
courses  of  action.   First,  it  attempts  to  fix  the  source  string  by  using 
the  string  generator  procedure.   If  this  procedure  is  not  able  to  find  an 
acceptable  change  to  the  program  which  will  fix  the  error,  then  the 
second  course  of  action  is  followed.   This  involves  making  the  reduction 
anyway  either  including  the  symbol  on  the  top  of  the  stack  or  not  depend- 
ing on  whether  it  is  not  or  is  marked,  respectively,  as  one  of  the  symbols 
which  can  follow  this  nonterminal.   An  example  of  the  message  given  to  the 
programmer  in  this  case  is: 
*****  REDUCED  TO  A  DECLARATIONTYPE 

3.5  Error  Recovery  with  Procedure  ERRR 

All  other  cases  use  the  procedure  ERRR.   This  includes  the  error 
production  at  the  end  of  all  groups  which  did  not  satisfy  the  conditions 
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for  the  special  cases  of  either  of  the  two  preceding  situations.   It  also 

includes  the  error  lookahead  test  at  the  end  of  a  subgroup  with  lookahead 

tests.   This  section  describes  the  two  methods  used  by  ERKR,  first  the 

string  generator  procedure  and  second  the  discarding  of  symbols  based  on 

the  symbols  in  the  marker  stack,  and  all  the  special  situations  that  are 

checked  by  the  procedure. 

Groups  which  do  not  fit  either  of  the  two  special  cases  mentioned 
above  in  sections  3.3  and  3.^4  are  concluded  with  an  error  production  con- 
sisting of  the  parser  instruction  ERRR.   After  each  of  the  stack  tests  have 
been  applied  without  success,  this  production  is  used.   The  first  step  in 
the  recovery  in  this  case  is  to  attempt  to  correct  the  source  string  by 
calling  the  string  generating  procedure.   If  this  procedure  fails  to  find 
a  correction  which  will  fix  the  program,  the  ERRR  procedure  then  looks 
at  the  marker  stack.   It  forms  a  symbol  set  which  is  the  union  of  all  the 
following  symbol  sets  associated  with  each  of  the  symbols  in  the  marker 
stack.   Markers  for  symbols  which  have  been  found  but  not  yet  popped 
because  the  end  of  the  right  hand  side  has  not  yet  been  reached  are  marked 
as  found  by  the  FDNT  stack  comparison  operator  and  these  markers  are  not 
included  in  this  symbol  set. 

The  source  symbols  are  then  skipped,  beginning  with  the  one  at 
which  the  error  was  detected,  until  one  is  found  which  is  a  member  of  the 
set  of  following  symbols  formed  above.   This  symbol  is  compared  with  the 
following  symbol  sets  of  each  entry  in  the  marker  stack  beginning  at  the 
top  to  find  which  marker  it  is  that  this  symbol  follows.   The  parsing 
stack  is  then  reduced  to  this  nonterminal  and  the  next  group  of  productions 
applied  will  be  the  NTPN  group  corresponding  to  this  nonterminal,  whose 
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address  is  in  the  marker  symbol.   The  marker  stack  is  also  reduced  to 
this  level.  This  amounts  to  assuming  that  all  the  symbols  skipped  were 
supposed  to  have  made  up  that  construct  identified  by  the  nonterminal 
to  which  they  were  reduced  and  that  some  gross  error  caused  them  to  be 
unparsable.   It  is  also  possible  that  the  error  was  such  that  the  string 
appeared  to  be  a  different  construct  and,  by  the  time  the  error  was 
detected,  the  parse  had  gone  irretrievably  far  down  the  wrong  path.  This 
second  attempt  at  recovery  nearly  always  gets  the  parser  out  of  the 
difficulty,  allowing  it  to  continue  the  parse. 

Procedure  ERER  has  some  other  measures  which  it  applies  to 
catch  infinite  loops  and  other  special  situations  which  can  occasionally 
arise  from  the  above  recovery  techniques.   If  a  reduction  is  to  take  the 
same  name  as  was  already  on  the  top  of  the  stack,  then  that  reduction 
is  not  made  and  the  search  for  an  acceptable  following  symbol  is  continued. 
An  end  of  file  mark  is  always  an  acceptable  following  symbol  and  causes 
a  reduction  to  the  top  marker  symbol  if  it  is  not  a  legitimate  following 
symbol  of  one  of  the  other  markers  in  the  stack.   If  ERRR  is  called  four 
times  in  succession  without  the  scanner  moving  further  down  the  input 
stream,  then  the  first  level  of  recovery,  the  string  generator,  is  by- 
passed.  Since  the  input  is  skipped  to  a  symbol  which  is  guaranteed  to 
follow  the  nonterminal  to  which  the  reduction  is  made,  that  symbol  will 
be  correct  and  the  parse  is  assured  of  moving  at  least  one  symbol  past 
the  hangup.   A  parse  completely  dependent  upon  semantic  tests  can  get 
into  a  situation  in  which  the  string  generator  creates  a  string  for  which 
the  semantic  tests  fail,  causing  the  error  to  continue  to  occur.   The 
previous  check  prevents  this  sort  of  situation  from  causing  the  parser  to 
get  stuck. 
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Other  special  considerations  apply  with  regard  to  end  of  file 
marks  and  program  goal  symbols.   If  ERRR  is  called  a  fifth  time  with  an 
end  of  file  mark  as  the  next  symbol,  it  automatically  terminates  the 
parse  by  transferring  to  the  DONEWITHPASSL  Jab el  which  is  the  same  effect 
as  the  EXIT  parser  instruction.   If  the  top  marker  in  the  stack  is  the 
program  marker,  which  is  followed  only  by  an  end  of  file  mark,  rather  than 
skipping  the  rest  of  the  program,  the  parse  is  restarted  at  that  point 
by  transferring  to  the  starting  group  of  productions.   This  will  usually 
result  in  at  least  one  more  subsequent  error,  such  as  inserting  a  BEGIN, 
but  at  least  the  rest  of  the  program  is  able  to  be  given  a  syntax  check. 

The  error  production  at  the  end  of  NTH  and  CNTH  groups  is 
always  the  ERNR  instruction.   The  action  in  this  case  is  to  mark  the  top 
stack  symbol  as  a  nonterminal  so  that  it  will  not  be  considered  in  the 
generation  of  strings  or  as  a  possible  following  symbol  and  then  to  call 
the  ERRR  procedure  which,  except  for  the  above  two  differences,  does  the 
same  as  described  above. 

In  subgroups,  groups  of  productions  with  a  common  stack  test 
differentiated  by  lookahead  test,  the  last  production  has  a  lookahead 
test  which  is  to  catch  errors  which  would  cause  all  the  lookahead  tests 
to  fail.   In  this  case,  which  is  signaled  by  the  XLRR  parser  instruction, 
if  the  lookahead  test  fails,  the  procedure  ERRR  is  called.   The  proce-  : 
dure  ERRR  has  a  Boolean  parameter  which  is  used  to  distinguish  the  looka- 
head case  from  the  nonlookahead  cases  both  for  the  subsequent  use  of  the 
TSKTSK  procedure  and  to  determine  whether  or  not  to  include  the  top  stack 
symbol  in  the  recovery.   If  the  group  is  one  with  a  nonterminal  stack  test, 
such  as  NTH  or  NTPN ,  the  corresponding  error  lookahead  parser  instruction 
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is  XLNR.   This  instruction  does  the  same  thing  as  the  ERNR  instruction 
before  calling  the  procedure  ERRR. 

This  section  has  discussed  the  two  methods  which  procedure 
ERRR  uses  to  recover  from  the  errors  which  do  not  satisfy  the  conditions 
for  the  situations  handled  by  the  methods  of  the  previous  two  sections. 
These  two  methods  are  the  use  of  the  string  generator  procedure  and,  if 
the  string  cannot  be  corrected  easily,  the  use  of  the  marker  stack  symbols 
to  control  the  skipping  of  symbols  until  one  is  found  which  can  follow 
one  of  the  marker  symbols.   Also  discussed  were  the  various  checks  that 
are  used  by  ERRR  to  prevent  the  parser  from  becoming  stuck  in  particular 
sequences  of  errors  which  can  occasionally  arise. 

3.6  The  String  Generator  Procedure  ERRFPINTERPRETER 

The  procedure  ERRFPINTERPRETER  calls  a  recursive  procedure 
TESTFP  which  examines  all  the  productions  of  a  group  in  order  to  generate 
all  the  strings  of  length  either  two  or  three  which  can  legally  come 
next  in  the  syntax.   Each  string  is  compared  with  the  next  four  input 
symbols  and  any  string  with  some  sort  of  match  is  saved.   These  strings 
are  then  examined  for  some  recognizable  patterns.   If  one  is  found,  the 
source  string  of  symbols  is  modified  to  be  like  the  pattern  of  the  string 
generated. 

The  string  generator  procedure  uses  the  parsing  table  to  find 
all  possible  two  or  three  symbol  strings  which  could  occur  at  this  point 
in  the  syntax.   It  uses  the  stack  comparison  field  and  the  parser  action 
field,  and  uses  a  copy  of  the  marker  stack.   Some  modifications  of  the 
parser  instructions  were  necessary  to  allow  them  to  be  used  to  generate 
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strings  as  well  as  to  parse.   Because  the  lookahead  comparison  field  is 
not  used  in  generating  strings  for  error  recovery,  it  is  necessary  to 
have  the  symbol  number  in  the  stack  test  field  of  the  TPN  productions  with 
an  ILVL  stack  test.   The  ILVL  null  stack  test  is  used  on  TPN  productions 
which  follow  a  production  with  a  lookahead  test  since  in  parsing  it  is  not 
necessary  to  test  the  stack  in  such  cases. 

Other  additions  are  due  to  the  need  to  apply  all  the  productions 
of  a  group  instead  of  just  one.   Hence  it  is  necessary  to  be  able  to  tell 
when  a  group  ends.   This  requirement  is  no  problem  with  those  groups  which 
end  with  an  error  production,  but  it  has  necessitated  the  various  null 
stack  tests  used  in  NTPN  and  CNTPN  groups  which  have  no  error  production. 
For  example,  in  applying  an  NTPN  group,  all  the  productions  through  the 
FDNT  and  CNONE  are  used  until  a  NONE  or  another  FDNT  comes  up  indicating 
a  new  NTPN  group.   All  of  these  modifications  are  done  by  the  table  build- 
ing program,  FPL2PAR/TWS  in  the  preprocessing  of  the  syntax. 

Each  string  found  which  bears  some  resemblance  to  the  actual 
string  is  saved  in  a  table.   Each  entry  consists  of  three  12-bit  fields 
which  contain  the  three  symbols  of  the  string  and  a  9-bit  field  which 
contains  a  coded  representation  of  the  extent  to  which  this  string  matches 
the  given  string.   The  four  symbols  of  the  given  string  are  numbered  0 
through  3.   The  9-bit  field  consists  of  three  3-bit  fields,  one  for  each 
of  the  symbols  of  the  generated  string.   If  one  of  the  symbols  of  the 
generated  string  matches  one  of  the  symbols  of  the  given  string,  the 
number  of  the  given  symbol  is  put  in  the  3-bit  field  for  that  generated 
symbol;  otherwise  a  7  is  put  in  the  3-bit  field. 
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If  after  two  symbols  the  pattern  is  "77"  >  i.e.  the  two  symbols 
did  not  match  any  of  the  symbols  of  the  given  string,  then  this  path 
is  discontinued  and  not  put  in  the  table.   If  the  pattern  is  "70"  to 
"73"  and  the  second  symbol  is  a  class  symbol,  identifier,  or  number, 
the  string  is  extended  to  three  symbols.   If  the  string  has  only  one 
match  with  the  given  string  and  that  match  is  a  class  symbol,  the  string 
is  not  saved  in  the  table.   Consequently  the  table  will  contain  all  two 
or  three  symbol  strings  in  which  there  are  two  matches  with  the  given 
string  and  all  two  symbol  strings  with  one  match  which  is  not  a  class 
symbol.   Under  the  control  of  a  control  card,  the  string  generator  can 
extend  all  strings  to  three  symbols,  instead  of  just  those  mentioned 
above  involving  a  class  symbol. 

The  string  generator  is  implemented  with  a  recursive  procedure 
TESTFP  that  consecutively  adds  each  of  the  stack  test  symbols  of  a  group 
to  the  current  string.   For  each  one  it  adds,  it  does  the  parser  action 
field,  calling  TESTFP  for  that  next  group.   Since  the  markers  cannot 
actually  be  popped  but  must  remain  available  for  other  paths,  those  that 
are  to  be  popped  are  simply  marked  with  the  current  recursive  nesting 
level.   After  applying  the  parser  action  field,  TESTFP  unmarks  any 
markers  marked  as  popped  for  that  level  or  greater  and  applies  the  next 
production  of  that  group. 

The  initial  call  of  TESTFP  is  affected  by  the  type  of  error 
situation  in  which  the  string  generator  is  called.   Each  time  the  parser 
action  transfers  to  a  new  group,  the  starting  address  of  that  group  is 
saved.   In  an  ERRR  type  error,  this  address  is  used  as  the  starting 
point  of  the  group.   When  the  string  generator  is  called  from  the 
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PUTINSTACK  procedure,  only  the  current  production  is  used  in  the  initial 
TESTFP  call.   In  the  case  of  the  string  generator  being  called  from  a 
lookahead  error,  only  those  productions  with  the  same  stack  test,  the 
subgroup  in  which  the  lookahead  failed,  are  considered  as  the  group  for 
the  initial  TESTFP  call. 

After  the  strings  which  could  apply  have  been  generated,  the 
string  generator  searches  the  table  for  any  occurrence  of  particular 
patterns  in  order.   These  patterns  are  given  in  Figure  ik   for  the  normal 
case  and  in  Figure  15  for  the  extended,  three  symbol,  string  case.   If 
the  first  pattern,  "70"  in  the  normal  case  or  "701"  in  the  extended  case, 
comes  up  during  the  generation  process,  the  generation  of  the  strings  is 
stopped  and  the  one  just  generated  is  used.   The  first  recognizable 
pattern  it  finds  is  used  as  a  basis  for  modifying  the  program.   The  pro- 
gram is  changed  to  be  equivalent  to  the  generated  string,  the  program 
before  the  change  and  afterwards  is  described  to  the  programmer,  the 
parser  table  address  is  set  to  be  the  one  which  was  used  as  starting 
address  for  the  initial  TESTFP  call,  and  the  string  generator  procedure 
returns  with  the  value  true. 

For  an  example,  assume  that  "ID1  :=  ID2  +  3"  is  a  statement  in 
a  language  in  which  the  ":="  is  supposed  to  be  the  single  symbol  "■«-" 
and  that  after  seeing  "ID1"  the  parser  follows  a  path  in  which  the  ":" 
is  not  valid.   It  would  get  the  four  symbols  ":  =  ID2  +"  for  comparison. 
When  it  generates  the  string  "-*-  identifier",  it  has  the  pattern  72  in 
which  the  second  symbol  is  a  class  symbol  so  it  goes  ahead  one  more  symbol 
and  finally  gets  the  pattern  723  for  the  string  "■«-  identifier  +" .   In  this 
case  the  first  pattern  it  recognizes  when  searching  the  table  of  generated 


strings  will  "be  the  72  (or  723  in  the  extended  string  mode).   It  will 
then  replace  the  symbols  in  positions  0  and  1  with  the  symbol  correspond- 
ing to  the  "J.      The  messages  printed  will  he 

*****  igij>  ERROR  AT  CHARACTER  ";"  *************************************** 
*****  CHANGED:  :  =  ID2  + 
*****      T0:  ^  ID2  + 

If  none  of  the  acceptable  patterns  match  any  of  the  ones  entered 
into  the  table,  the  string  generator  procedure  returns  the  value  false 
and  the  calling  routine  takes  another  course  of  action.  These  other  courses 
of  action  have  been  mentioned  previously  for  PUTINSTACK,  ERRR,  and  ERRN. 

This  section  has  shown  how  the  string  generator  procedure  is  used 
to  generate  legal  strings  at  the  point  of  the  error  and  how  these  strings 
are  then  examined  for  the  best  match  with  the  program  symbols  in  order  to 
correct  the  program  for  the  compiler's  recovery  and  for  the  description 
of  the  error  to  the  programmer. 


Pattern  Interpretation 

70  a  symbol  was  left  out 

71  the  wrong  symbol  was  used,  possibly  a 

misspelling  of  a  reserved  word 

12  an  extra  symbol  was  inserted 

10  two  symbols  were  interchanged 

72  two  symbols  were  used  where  one  other 

was  required 

23  two  extra  symbols  were  inserted 


Two-symbol  String  Patterns  for  Error  Correction 

Figure  ik 
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Pattern  Interpretation 

701  a  symbol  "was  left  out 

712  the  wrong  symbol  was  used,  possibly  a 

misspelling  of  a  reserved  word 

123  an  extra  symbol  "was  inserted 

102  two  symbols  were  interchanged 

770  two  symbols  were  left  out 

120  three  symbols  out  of  order 

201  three  symbols  out  of  order 

301  three  symbols  out  of  order 

12  an  extra  symbol  was  inserted 

723  two  symbols  were  used  where  one  other 

was  required 

23  two  extra  symbols  were  inserted 

771  one  symbol  was  used  where  two  others 

were  required 

772  two  symbols  were  used  in  place  of  two 

others 

Three-symbol  String  Patterns  for  Error  Correction 

Figure  15 
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3.7  Summary 

The  error  recovery  of  the  modified  Floyd  production  language 
consists  of  two  phases :  the  conditions  that  are  determined  during  the 
preprocessing  of  the  syntax  and  are  built  into  the  parsing  table,  and 
the  procedures  in  the  compiler  which  do  the  error  recovery.  The  error 
conditions  for  which  parser  instructions  are  built  into  the  parsing 
table  fall  into  three  groups:   the  error  productions  at  the  end  of 
groups  which  include  the  parser  instructions  ERECT,  ERRR,  and  ERNR,  the 
unique  symbol  stack  test  instructions,  XSIS  and  XSIB,  and  the  lookahead 
test  instructions  which  are  used  at  the  end  of  the  subgroups  with  looka- 
head, XLRR  and  XLNR.  Each  of  these  three  groups  of  instructions  uses 
the  procedure  TSKTSK  to  print  the  initial  part  of  the  error  message,  but 
the  recovery  measures  taken  vary.  The  XSIS  and  XSIB  instructions  call 
PUTINSTACK  which  first  uses  the  table  of  following  symbols,  then  the 
procedure  ERRFPINTERPRETER,  and  last  just  inserts  the  symbol.   Instruction 
ERRN  first  uses  procedure  ERRFPINTERPRETER,  then  makes  a  reduction  anyway. 
The  rest  of  the  error  parser  instructions  use  procedure  ERRR  which  first 
tries  the  string  generator  procedure  ERRFPINTERPRETER  and,  if  that  fails 
to  provide  any  recovery,  discards  a  portion  of  the  input  based  on  the 
symbols  in  the  marker  stack. 

These  are  the  basic  mechanisms  described  in  this  chapter.   Their 
effectiveness  is  discussed  in  the  succeeding  chapters. 
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CHAPTER  h.       DEMALGOL  COMPILER 
„_« 
In  order  to  test  and  develop  "both  the  translator  writing 
system  and  the  error  recovery  portion  of  it,  a  small  language  was  used. 
This  small  language  called  DEMALGOL  is  essentially  a  subset  of  ALGOL. 
In  this  section,  the  syntax  and  semantics  of  DEMALGOL  are  given.  The 
rest  of  this  chapter  discusses  the  error  recovery  of  the  DEMALGOL  com- 
piler. The  results  of  this  compiler  are  compared  with  the  results  of 
two  ALGOL  compilers  for  the  same  programs. 

DEMALGOL  includes  three  declaration  types:   INTEGER,  BOOLEAN, 
;  and  LABEL;  most  basic  statements:   assignment,  block,  compound  statement, 
;  go  to,  and  conditional;  and  both  Boolean  and  arithmetic  expressions  with 
'OR,  AND,  NOT,  relational  operators,  +,  -,  x,  * ,  and  unary  -.   Besides  the 
LABEL  declaration,  there  are  three  additions  to  DEMALGOL  beyond  ALGOL-60 : 
(l)  In  the  definition  of  a  block  ,  a  ";"  terminates  each  <statement> 
rather  than  just  separating  <statement>s ;  (2)  Declaration  type  <identifier> 
has  been  added;  and  (3)  A  program  is  concluded  with  a  "."  following  the 

1  last  END.   The  first  change  was  to  bring  about  compatibility  with  the 

i 
recursive  descent  compiler  to  be  described  in  Chapter  VIII.   The  second 

change  was  to  improve  the  error  recovery  in  the  case  someone  used  a  non- 

DEMALGOL  declaration  type  such  as  REAL,  FILE,  ARRAY,  etc.   The  third 

change  was  to  allow  DEMALGOL  programs  to  be  compatible  with  the  Burroughs 

ALGOL  compiler.   The  syntax  of  DEMALGOL  is  given  in  Figure  16. 

Only  some  basic  semantic  functions  have  been  implemented  in  the 

compiler.   Identifiers  are  marked  in  the  symbol  table  as  integer,  Boolean, 
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<PROGRAM>  :  :  =  <BLOCK>  . 

<BLOCK>  :  :  =  <BEGIN>  <DECLARATION  LIST>  <STATEMENT  LIST>  END 

<BEGIN>  ::=  BEGIN  <COMMENT>;  |  BEGIN 

<COMMENT>  : : =  <COMMENT>  <any  terminal  except  semicolon>  |  COMMENT 

DECLARATION  LIST>  :  :  =  <DECLARATION  LIST>  <TYPE>  <TYPELIST> 

<SEPARATOR>  |  <TYPE>  <TYPE  LIST>  <SEPARATOR> 
<TYPE>  : : =  INTEGER  |  LABEL  |  BOOLEAN  |  <identif ier> 
<TYPE  LIST>  : :  =  <TYPE  LIST>,  <identif ier>  |  <identif ier> 
<SEPARATOR>  :  :  =  ;  <COMMENT>  ;  |  j 
<STATEMENT  LIST>  :  :  =  <STATEMENT  LIST>  <STATEMENT>  <SEPARATOR>  | 

<STATEMENT>  <SEPARATOR> 
<STATEMENT>  :  :  =  <identifier>  :  <STATEMENT>  |  <BLOCK>  |  ' 

<GO  TO  SYMB0L>  <identifier>  |  <identifier>  <assign  arrovo 

<BOOLEAN  EXPRESSION  |  <identifier>  <assign  arrow> 

<ARITHMETIC  EXPRESSION>  |  IF  <BOOLEAN  EXPRESSION 

THEN  <STATEMENT>  ELSE  <STATEMENT>  |  <empty> 
<GO  TO  SYMBOL>  :  :  =  GO  TO  |  -GOTO  |  GO 
<ASSIGN  ARROW>  : :  =  :  =  |  <- 

<BOOLEAN  EXPRESSION  :  :  =  <BOOLEAN  EXPRESSION  OR  <BOOLEAN  TERM>  |  <BOOLEAN  TERM> 
<BOOLEAN  TERM>  :  :  =  <BOOLEAN  TERM>  and  <BOOLEAN  PRIMARY>  |  <BOOLEAN  PRIMARY> 
<BOOLEAN  PRIMARY>  :  :  =  <identif ier>  |  TRUE  |  FALSE 

(<BOOLEAN  EXPRESSION)  |  <ARITHMETIC  EXPRESSION 

<RELATION  <ARITHMETIC  EXPRESSION 
<RELATION  ::==|/|<|>|<|> 

<ARITHMETIC  EXPRESSION  :  :  =  <ARITHMETIC  EXPRESSION  <ADD>  <TERM>  |  <TERM> 
<TERM>  :  :  =  <TERM>  <MULTIPLY>  <PRIMARY>  |  <PRIMARY> 
<PRIMARY>  :  :  =  <identif ier>  |  <number>  |  (<ARITHMETIC  EXPRESSION) 
<ADD>  :  : ^  +  |  - 
<MULTIPLY>  : :  =  x  |  / 

Syntax  of  DEMALGOL 
Figure  16 
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or  label  according  to  their  last  declared  occurrence.   The  left  hand  side 
identifier  of  an  assignment  statement  is  tested  for  having  integer  or 
Boolean  attribute  to  resolve  a  local  ambiguity  in  the  parser.   If  at  that 
time  the  identifier  is  undeclared,  it  is  assumed  integer  and  marked  as  such 
in  the  symbol  table.   A  message  regarding  this  error  is  printed.  When 
the  declaration  type  <identifier>  is  used,  the  semantic  routine  prints 
a  message  stating  that  this  declaration  type  is  not  allowed.   The  sub- 
sequent identifiers  are  undeclared.   The  semantics  of  DEMALGOL  is  given 
in  Figure  IT. 

DEMALGOL  is  a  small  language  with  the  only  semantics  implemented 
being  that  which  is  necessary  for  parsing.   However,  it  is  large  enough 
to  contain  a  variety  of  syntactic  constructs  and  so  has  been  useful  in 
testing  and  demonstrating  the  effectiveness  of  the  error  recovery  system 
described  in  the  previous  chapter. 

h.2     Error  Recovery  in  DEMALGOL 

The  examples  described  in  this  section  and  the  next  are  from 
seven  programs.  One  of  these  was  written  by  the  author  with  five  inten- 
tional errors,  while  the  others  were  written  by  novice  programmers  who 
were  attempting  to  write  correct  programs.  This  section  discusses  the 
effectivenes  of  the  DEMALGOL  compiler  in  recovering  from  the  errors  in 
these  programs  and  in  describing  them  to  the  programmers .  The  next 
section  summarizes  all  the  errors  and  the  recovery  in  each  case. 

The  only  time  the  compiler  could  not  make  any  sense  at  all  of 
the  program  was  with  the  string  "FILE  DATA  (2,  10);".   The  mistake  was 
in  declaring  a  file.   When  the  word  FILE  was  read,  a  message  was  printed 
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At  declaration  type  INTEGER 

DECLARATIONTYPE  :=  1; 
at  declaration  type  BOOLEAN 

DECLARATIONTYPE  :=  3; 
at  declaration  type  LABEL 

DECLARATIONTYPE  :=  2; 

at  declaration  type  <identifier> 

BEGIN  DECLARATIONTYPE  :=  15;  TSKTSK( FALSE, FALSE); 

WRITE  (LINE,  "*****  THIS  DECLARATION  TYPE  NOT  ALLOWED. ") END; 

at  each  identifier  in  the  declaration  type  list 

IF  DECLARATIONTYPE  =  15  THEN  ELSE  mark  this  entry  in  the  symbol 
table  with  DECLARATIONTYPE; 

to  test  for  BOOLEAN  declared  primary  identifiers 

SEMANTICTEST  : =  is  this  entry  in  the  symbol  table  marked  =  3; 
to  test  for  labels  at  the  beginning  of  statements 

SEMANTICTEST  :  =  is  this  entry  in  the  symbol  table  marked  -  2; 

after  label  in  <go  to  statement> 

IF  J  :  =  this  entry  in  symbol  table  marked  =  2  THEN  ELSE 
BEGIN  TSKTSK( FALSE, FALSE); 

WRITE  (LINE,  "*****  THIS  LABEL  WAS  NOT  DECLARED."); 
IF  J  =  0  THEN  mark  this  entry  in  the  symbol  table  with 
2;  END; 

at  arithmetic  expression  primary  identifiers 

IF  J  :=  this  entry  in  the  symbol  table  marked  =  1  THEN  ELSE 
BEGIN  TSKTSK( FALSE, FALSE); 

WRITE  (LINE,  "*****   THIS  IDENTIFIER  WAS  NOT  DECLARED 
INTEGER."); 

IF  J  =  0  THEN  mark  this  entry  in  the  symbol  table  with 
1;  END; 

Semantics  of  DEMALGOL 
Figure  17 


53 
"by  a  semantic  routine  telling  the  programmer  that  this  declaration  type 
was  not  allowed.   An  error  was  also  detected  at  the  left  parenthesis.   A 
pattern  involving  inserting  two  symbols  was  found  and  after  "DATA"  " ;  IF" 
was  inserted.    The  ","  was  changed  to  a  "+"  at  a  subsequent  error. 
Finally  an  error  was  detected  at  the  ";".   Since  this  was  followed  by 
"LABEL  ENDA,  ",  no  string  of  symbols  generated  resembled  the  given  string. 
Hence  the  compiler  assumed  it  had  found  a  <statement  list>  followed  "by 
the  ";"  and  started  looking  for  another  <statement>.   "LABEL"  was  an 
error  as  the  beginning  of  a  statement,  but  this  was  easily  fixed  by 
inserting  a  "BEGIN".   At  the  end  of  the  program,  it  was  then  necessary 
to  insert  a  matching  "END". 

Although  in  most  cases  the  alternative  of  generating  two 
symbol  strings  for  error  recovery  is  either  equal  to  or  poorer  than  that 
of  generating  three  symbol  strings,  in  this  case  it  would  be  better. 
None  of  the  possible  two  symbol  strings  generated  at  that  point  would 
have  matched  the  given  string,  so  the  compiler  would  have  scanned  to 
the  ";"  and  reduced  the  skipped  symbols  to  a  <type  list>.   This  would 
have  avoided  all  the  extra  errors  mentioned  above  and  would  have  been 
clearer  to  the  programmer. 

In  all  these  example  programs,  the  most  frequent  mistake  was 
leaving  out  a  symbol;  this  was  done  kl   times.   Eight  times  the  wrong 
symbol  was  used;  six  times  an  extra  symbol  was  put  in;  and  once  two 
symbols  were  interchanged.   On  three  of  these  occasions,  there  was 
another  error  within  the  range  of  the  context  used  to  analyze  the  first 
error.   In  each  of  these,  the  first  error  was  corrected  in  such  a  way 
that  the  second  error  disappeared  too,  once  exactly  as  the  programmer 


had  intended.  Also  in  one  place,  three  errors  were  each  affected  by  a 
subsequent  error  as  the  programmer  made  four  mistakes  in  a  row.  The  first 
error  was  corrected  as  the  programmer  intended,  and  the  correction  of  the 
second  eliminated  the  third,  leaving  the  fourth  as  an  isolated  error 
which  was  handled  correctly.   Of  the  remaining  52  errors,  four  were  not 
corrected  exactly  as  the  programmer  had  intended,  although  in  three  of 
the  four,  the  effect  was  the  same  as  that  of  the  perfect  correction. 
Three  more  of  the  52  errors  were  caught  by  the  declaration  type  <identifier> 
and  flagged  by  the  semantic  routine  as  invalid  declaration  type.  This 
means  that  is  only  one  of  the  53  isolated  error  situations  did  the  com- 
piler make  the  wrong  adjustment  to  the  program.  This  was  the  string 
"IF  M  #  THAN  55".   Instead  of  producing  "IF  M  /  55"  it  produced 
"IF  M  4   THAN  +  55". 

These  results  show  that  this  compiler  was  able  to  tell  the 
programmer  what  his  mistake  was  in  98%  of  the  isolated  errors  and  in 
about  90%  of  all  the  errors . 

k.3     Summary  of  the  Errors 

In  this  section,  all  the  errors  in  the  seven  DEMALGOL  programs 
will  be  presented  along  with  the  compiler's  recovery  from  them.   The 
recoveries  will  be  rated  and  the  effectiveness  of  the  compiler  for  these 
programs  will  be  calculated. 

Each  error  is  identified  followed  in  parentheses  by  the  number 
of  times  it  occurred  if  it  occurred  more  than  once.   The  actual  construct 
that  was  intended  is  then  given  if  this  is  not  clear  from  the  statement 
of  the  error.   The  line  concludes  with  the  recovery  made  by  the  compiler 
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and  the  rating  of  that  recovery  according  to  the  ratings  given  in  Chapter  I : 
"E"  means  the  compiler  changed  the  program  to  that  which  the  programmer 
intended;  "G"  means  the  compiler  did  not  change  the  program  to  that  which 
the  programmer  intended,  hut  the  change  it  made  was  such  that  the  programmer 
should  have  no  trouble  identifying  his  error;  "F"  means  the  description 
of  the  compiler's  recovery  would  not  help  the  programmer  much  in  identifying 
his  actual  error;  "P"  means  the  compiler  could  have  misled  the  programmer 
as  to  the  identity  of  his  error. 

The  errors  made  in  the  seven  programs  and  the  compiler's  recovery 
measures  are  as  follows : 

;  left  out  after  either  <declaration>  or  <statement> 

(19  times)  ...  missing  ;  inserted  (E) 

ELSE  left  out  (8  times)  ...  ELSE  inserted  (E) 


THEN  left  out  (3  times) 
.  left  out  (6  times)  .. 
initial  BEGIN  left  out 
:  after  label  left  out 


THEN  inserted  (E) 

inserted  (E) 

BEGIN  inserted  (E) 

:  inserted  (E) 

:  of  assignment  operator  left  out  ...  :  inserted  (E) 
operator  missing  in  I  ■*•  I  +  kJ ;    ...  multiply  sign 

inserted  (E) 

=:  ...  changed  to  :=  (E) 

GO  TO  6;  ...  changed  to  GO  TO  <identifier>  (E) 

GO  TO  TE;  ...  7  deleted  (E) 
I  :=  TRUE;  (i  declared  INTEGER)  ...  changed  to 

I  :=  <identifier> ;  (E) 

IF  L  :=  0  ...  :  deleted  (E) 
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FILE 
REAL 


declaration  types  not  valid  but  allowed 


LABLE 
J 


syntactically  "by  <TYPE>  :  :=  <identifier> . 
Semantic  routine  printed  invalid  type 
message.  (E,  E,  E) 

THE  DAY  :=  ...  DAY  deleted  (Actually  the  error  here 

is  one  at  the  lexical  analysis  level  since  THE  DAY 
was  meant  to  be  the  single  identifier  THEDAY.)         (G) 
IF  THEDAY  +  JAN  1  >  7  ...  multiply  operator  inserted 
before  the  1  (Here  again  it  is  actually  a  lexical 
error. )  (G) 

GO  TO  MULTIPLY  END  ;  ...  END  and  ;  interchanged  and 

then  after  detecting  an  error  following  the  END  a 

;  was  inserted  following  the  END  (G) 

IFM/  THAN  55  ...  +  inserted  before  55  (G) 

SUM*-  SUM  +  10  END;  IF  ...  a  ;  is  missing  after  the  10 

and  the  END;  is  extraneous  since  there  has  been  only 

the  one  BEGIN  at  the  beginning  and  this  is  not  the 

end  of  the  program  . . .  The  recovery  treated  this  as 

one  error  and  deleted  the  END.  (E) 

GO  TO  MIX  ENDA;  ...  a  ;  is  missing  after  MIX  and  ENDA 
is  the  label  on  the  next  statement,  i.e.  the  ;  is 
intended  to  be  a  :  ...  Again  the  recovery  treated  this 
as  one  error  and  deleted  the  ENDA.  (G) 

A  conditional  statement  was  of  the  form  IF  <EXPRESSI0N> 
THEN  <STATEMENT>  with  ELSE  <STATEMENT>  ;  missing 
(Since  <STATEMENT>  : :=  EMPTY,  we  can  consider  ELSE  ; 
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as  missing)  . . .  The  recovery  inserted  only  the  ELSE 
causing  the  next  statement  to  follow  the  ELSE.         (G) 
I  +  A  followed  by  A  +  1  ...  The  programmer  intended 

this  to  he  I  •*-  A;  and  A  •*•  1;  ...  In  the  three  symbol 
string  mode  this  was  changed  after  three  errors 
flagged  to  I  ■(■  0  +  A  x  A  +  1;  Due  to  the  way  the 
implementation  was  handled,  the  two  symbol  string  error 
recovery  changed  this  to  I  -*-  A  x  A  +  1;  If  the  two  symbol 
string  error  recovery  had  not  allowed  the  pattern  710 
as  an  acceptable  pattern  with  the  second  symbol  an 
identifier  and  then  checked  only  the  71  when  the  pro- 
gram was  changed,  it  would  have  found  no  acceptable 
pattern  for  the  first  +  and  since  the  top  marker  was 
<ASSIGN  ARR0W>  it  would  have  skipped  to  the  first  A 
and  called  the  intervening  +  an  <ASSIGN  ARR0W> .   Note 
that  the  effect  is  the  same,  but  the  way  it  was  actually 
done  is  clearer  to  the  programmer  than  the  message: 
*****  REDUCED  1  SYMBOL  TO  A  ASSIGNARROW  (E,  G,  E) 

Table  1  gives  a  tabulation  of  the  above  results.   From  this 

table,  the  measure  of  effectiveness  given  in  Chapter  I  can  be  calculated. 

This  formula  was : 

™«  4.-  E  +  3AG  +  F/2  +  P/U     N 

Effectiveness  =  ' — — ' ' —  • — 

N  +  M  N  +  X 

where  N=E+G+F+P. 

From  the  above  table,  E=  51,  G=  8,  F=0,  P  =  0,  M=0,  X  =  5-   Hence 
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Gross  errors:   1 

Number  of  symbols  involved:   5 
Number  of  symbols  skipped:   5 
Subsequent  errors :  k 

Multiple  errors : 
Doubles :   3 

First  recovery  fixed  second  error: 
Triples :   1 

First  error: 

Second  fixed  third: 

Single  errors:   53 

Subsequent  errors:   1 

Totals: 

Actual  errors:   63 
Compiler  error  count :  6k 
Extraneous  errors :   5 
Errors  missed:   0 


E  G         F         P 

0  10  0 


10  0  0 

0  10  0 


^9 


0  0 


51         8         0         0 


Table   1.      Table   of  DEMALGOL  Syntax  Errors 
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the  effectiveness  of  the  DEMALGOL  compiler  on  these  programs  is  .891. 
If  we  were  to  ignore  the  four  extra  errors  produced  as  a  result  of  the 
FILE  declaration,  the  effectiveness  would  be  .95-   Appendix  1  gives 
listings  of  the  DEMALGOL  programs. 

k.h     Comparison  of  Burroughs  ALGOL  Compiler  with  DEMALGOL  Compiler 

The  seven  programs  mentioned  above  were  also  run  on  the  Bur- 
roughs ALGOL  compiler.  This  was  possible  because  DEMALGOL  is  a  subset 
of  the  Burroughs  Extended  ALGOL.  This  section  presents  the  results  of 
these  runs. 

The  errors  detected  and  the  corresponding  diagnostic  messages 
were  as  follows : 

;  missing  (8  times)  ...  "missing  ' ; '  or  END"  (E) 

THEN  missing  (2  times)  ...  "missing  THEN"  (E) 

THE  DAY  ...  "undeclared  identifier"  (E) 

JAN  1  ...  "undeclared  identifier"  (E) 

x  (multiply  sign)  missing  ...  "missing  ';'  or  END"  (G) 

IF  L  :=  0  THEN  ...  "expression  not  of  type  Boolean"         (G) 
GO  TO  6  ...  "expression  not  of  designations!  type"  (G) 

GO  TO  7E  ...  "expression  not  of  designational  type"         (G) 
+   THAN  ...  "undeclared  identifier"  (G) 

I  :=  TRUE  ...  "primary  may  not  begin  with  a  quantity 

of  this  type"      ,  (F) 

final  .  missing  (6  times)  ...  compiler  terminated 

with  an  end  of  file  error  (P) 
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This  is  a  total  of  2k   errors.  Twelve  were  rated  "E" ,  five  were  rated 
"G",  one  was  rated  "F",  and  six  were  rated  "P".  The  total  is  less  than 
that  for  the  previous  section  since  some  of  the  former  errors  are  valid 
constructs  in  ALGOL.   Since  the  recovery  used  is  to  skip  to  the  next 
semicolon,  the  compiler  missed  21  errors  as  it  skipped  over  them.   Although 
there  were  no  extra  errors  created  here,  this  compiler  has  been  known  to 
create  hundreds  of  extra  errors.  The  value  of  the  effectiveness  formula 
for  this  set  of  recoveries  is  .39^.  The  above  ratings  do  not  take  into 
account  the  fact  that  the  error  messages  actually  give  just  a  number  which 
is  used  as  a  reference  to  an  external  listing  of  the  error  messages.   If 
this  listing  is  unavailable,  the  message  given  the  programmer  is  simply 
that  an  error  occurred.   This  error  diagnostic  is  probably  a  "fair" 
description  of  the  programmer's  error.  The  author  has  frequently  heard 
someone  saying  something  like  "What  is  error  11^?".  By  assuming  all  the 
excellent  and  good  recoveries  are  just  fair,  the  value  of  the  effective- 
ness formula  for  these  recoveries  with  a  listing  of  the  error  messages 
unavailable  is  .233.   Obviously  the  automatic  error  recovery  functioned 
much  better  than  the  hand-coded  techniques  used  in  the  ALGOL  compiler. 

U.5   Comparison  of  7090  ALCOR  Compiler  with  DEMALGOL  Compiler 

The  7090  ALCOR-Illinois  compiler  developed  by  Gries,  Paul, 

and  Wiehle  (50)  was  also  available  for  comparison,  running  on  an  IBM 

709^+.   This  section  gives  the  results  of  running  the  previously  discussed 

seven  DEMALGOL  programs  on  this  compiler. 

This  compiler  is  based  on  a  transition  matrix  parsing  algorithm 

(U8)  and  hence  has  the  same  advantage  as  the  modified  Floyd  production 
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language  parsing  algorithm  used  in  the  DEMALGOL  compiler,  namely  that  of 
immediate  error  detection.   The  subroutines  which  handle  the  error  recovery- 
construct  the  error  messages  from  some  skeleton  forms  as  the  DEMALGOL 
compiler  does ,  using  the  symbols  given  and  the  names  of  the  nonterminal 
symbols  defining  the  language.   The  ALCOR  compiler  is  more  like  the  DEM- 
ALGOL compiler  than  the  Burroughs  ALGOL  compiler  in  its  effect.   It  missed 
only  a  few  errors ,  but  it  generated  quite  a  few  extraneous  errors  and 
the  error  messages  were  sometimes  vague. 

The  compiler  detected  10^  errors  of  which  5^+  were  extraneous, 
but  it  missed  six  errors.   Of  the  50  valid  errors  which  it  detected,  the 
error  descriptions  received  ratings  as  follows:   There  were  27  with  an 
"E"  rating,  15  with  a  "G"  rating,  seven  with  an  "F"  rating,  and  one  with 
a  "P"  rating.  By  the  effectiveness  formula  given  in  Chapter  1,  this 
compiler  on  these  programs  has  an  effectiveness  of  .U2.   Sixteen  of  the 
extra  errors  were  due  to  the  'LABEL'  declarations  in  which  all  the  identi- 
fiers in  the  <type  list>  were  flagged  as  undeclared.   With  these  sixteen 
extra  errors  ignored,  the  effectiveness  value  is  .U9T •   The  specific 
errors  and  the  ALCOR  compiler's  descriptions  of  them  can  be  found  in 
Appendix  2. 

The  ALCOR  compiler  found  almost  all  the  errors  and  described 
more  than  half  of  them  excellently.   Except  for  the  large  number  of 
extraneous  errors,  it  performed  very  well. 

4.6  Summary  Comparison  of  Three  Compilers 

The  ALCOR  compiler  is  more  effective  than  the  Burroughs  ALGOL 
compiler  by  the  measures  described  in  Chapter  1,  but  the  DEMALGOL  compiler 


62 
is  significantly  more  effective  than  either  of  the  other  two.  This  is 
illustrated  by  the  summary  given  in  Table. 2. 


DEMALGOL 

Burroughs  ALGOL 

ALCOR-Illinois 

Excellent 

E 

51 

12  (0) 

27 

Good 

G 

8 

5  (0) 

15 

Fair 

F 

0 

1  (18) 

7 

Poor 

P 

0 

6 

1 

Missed 

M 

0 

21 

6 

Extra 

X 

5 

(1) 

0 

5h   (38) 

Effectiveness 

.891 

(.< 

?5) 

.39^  (.233) 

.421  (.497) 

Table  2.  Error  Recovery  Effectiveness  of  Three  Compilers 


This  table  gives  for  each  of  the  three  compilers  operating  on  the  seven 
programs  (l)  the  number  of  errors  detected  arranged  according  to  their 
ratings;  (2)  the  number  of  errors  missed;  (3)  the  number  of  extraneous 
errors  detected;  and  {k)    the  effectiveness  value  calculated  by  the  formula: 

__.   ..         E  +  3AG  +  F/2  +  PA     N 
Effectiveness  =  '  N  +  M' FTT 

where  N=E+G+F+P. 

The  numbers  in  parentheses  give  the  alternative  counts  which  were  dis- 
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cussed  in  the  preceding  sections  and  the  resultant  changes  in  the  effec- 
tiveness values. 

These  results  demonstrate  the  practicability  of  the  methods 
of  error  recovery  proposed  in  this  thesis.  These  methods  have  "been  shown 
superior  when  compared  with  other  working  systems. 
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CHAPTER  5 •   COMPILERS  FOR  THREE  OTHER  LANGUAGES 

5.1  Introduction  to  the  Three  Languages  and  Their  Compilers 

Compilers  for  three  other  languages  have  "been  implemented 
using  the  current  version  of  this  translator  writing  system  with  its 
automatic  error  recovery.   Another  is  nearly  complete  and  some  work  has 
been  done  on  a  fifth  although  there  is  no  data  available  at  this  time 
on  the  effectiveness  of  the  error  recovery  for  either  of  these  two. 
Other  compilers  have  been  built  in  the  past  but,  since  these  were  com- 
pleted before  the  current  version  of  the  error  recovery  system  was 
implemented,  they  too  offer  no  data  for  effectiveness  studies.  Each  of 
these  has  been  a  complete  compiler  as  opposed  to  being  just  a  syntactic 
processor  as  is  the  DEMALGOL  compiler. 

This  chapter  gives  a  brief  description  of  each  language,  a 
discussion  of  some  of  the  error  situations  encountered  by  each  compiler, 
and  a  summary  of  the  error  recoveries  from  all  the  available  programs  for 
each  language. 

5.2  TESLA 

5.2.1  Introduction  to  TESLA 

TESLA  is  a  computer  design  diagnostics  language  developed  by 
Luther  Abel,  implemented  by  Nicole  Alldgre,  and  modified  by  Bill  McTeer. 
It  allows  the  description  of  circuit  board  logic  and  simulates  the  output 
signals  which  would  be  produced  by  that  logic  with  a  given  set  of  input 
signals.   The  hardware  failure  diagnostic  work  for  the  ILLIAC  IV  is  being 
done  with  the  use  of  TESLA.   A  complete  description  of  TESLA  can  be  found 
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in  Allegre  (l),  although  for  reference  the  syntax  has  been  copied  in 
Appendix  3- 

5.2.2  TESLA  Error  Recovery 

TESLA  has  a  quite  different  structure  from  the  other  languages 
considered  and,  because  of  the  manner  in  which  it  is  usually  used,  it 
is  subject  to  different  kinds  of  errors.  This  section  indicates  some 
of  the  special  error  situations  and  then  summarizes  the  error  recoveries 
in  the  runs  analyzed. 

Frequently  the  same  program  is  used  many  times  with  only  a  few 
cards  changed  each  time.  This  happens  when  the  same  circuit  logic  is  to 
be  tested  with  different  input  signals.  The  result  is  that  a  higher 
proportion  of  the  errors  in  TESLA  than  in  other  compilers  are  the  result 
of  cards  being  out  of  order.   Since  there  are  usually  several  symbols  on 
a  card,  this  means  that  to  the  parser  two  large  sets  of  symbols  have 
been  interchanged.   The  time  needed  to  analyze  a  context  greater  than 
three  symbols  is  prohibitive  except  in  very  simple  languages.   Hence  the 
interchange  of  two  cards  cannot  usually  be  identified  as  such  in  the 
parser,  and  the  error  recovery  must  handle  the  error  in  some  other  way. 

Another  common  cause  of  errors  in  TESLA  is  the  position  of  the 
sequence  [<number>  :  <number>]  which  is  called  <CASE  LIMITS> .   It  is 
easy  to  get  a  <CASE  LIMITS>  out  of  order  and,  since  this  construct  con- 
sists of  five  symbols ,  it  is  larger  than  the  context  available  for  analysis 
in  error  recovery.   <CASE  LIMITS>  can  therefore  never  be  put  in  the  right 
position  in  one  error  recovery  operation. 
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Only  a  few  programs  with  syntactic  errors  were  saved  from 
discard  and  could  "be  used  for  the  evaluation  of  the  error  recovery. 
The  following  errors  were  obtained  from  those  listings  of  runs  of  the 
TESLA  compiler  which  were  saved: 

;  missing  (3  times)  ...  ;  inserted  (E) 

final  END  missing  (2  times)  ...  END  inserted  (E) 

final  .  missing  (2  times)  ...  .  inserted  (E) 

DUMMY  used  where  only  <identifier>  is  allowed  . . . 

changed  DUMMY  to  <identifier>  (E) 

INPUT  SIGC  <valid  set  of  symbols>  ; ,  SIGC  ...  ;  extra 

...  changed  ,  to  INPUT  (equivalent  to  deleting  ;)      (E) 

DIGIT  A  2(  . . .  =  missing  after  A  . . .  Because  the  scanning 
mode  is  changed  back  and  forth  by  some  semantic 
actions,  the  2  was  scanned  as  a  character  instead  of  a 
number.   Hence  the  recovery  replaced  the  2  by  =. 
Otherwise,  if  the  2  were  a  number,  the  =  would  have 
been  inserted  between  A  and  2.  (E) 

<SIGNAL  LIST  ELEMENT>  (i.e.  <identifier>  ,)  out  of 

order  coming  after  the  end  of  the  <SIGNAL  LIST> 

(k   times)  ...  the  ,  was  changed  to  a  :  making  the 

<SIGNAL  LIST  ELEMENT>  a  <STEP  LABEL>  (G) 

,  NOT  (  ...  [<number>  :  <number>]  before  NOT  missing 

...  <identifier>  inserted  before  NOT  (also  valid)      (G) 

CLOCK  3;  ...  SIMULATION  CONTROLS>  missing  ...  changed 

to  REPEAT  3  a  SIMULATION  CONTROLS>  (G) 
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INPUT  0:23  SIGA  2(    ...    identifier  SIGA  should  be  after 
INPUT    ...    changed  to  INPUT   <identifier>  0:23  SIGA, 
<identifier>( 


Given : 
INPUT  SIG12 
STEP12202 : 
FFOLD; 
[0:0]    ... 


"\ 


Should  have  been: 


... 


J 


... 


J 


Created: 
r  INPUT  SIG12 
STEP12202 , 
<identifier> 
[0:0] 


v 


.    changed  to 


(G) 


STEP12202 : 
FFOLD; 
INPUT   SIG12 
[0:0]    ... 
Ek6:h6]2(    ...    should  have  been   [U6:U6]2( 

Ek6[k6:2)(  (G) 

NOT    [23:23]    ...    should  have  been   [23:23]   NOT    ... 

changed  to  NOT   <identifier> ,    [23:23]  (F) 

card  sequence  number  in  <INPUT  GROUP  ASSIGNMENT   list 

n[l2:12    ...    changed  to  <identifier>  n   (12  A  1  2  (F) 

Of  these  21  errors,   the  recovery  in  10  cases  was   excellent, 
while  in  nine  cases   it  was   good  and  in  two   cases   it  was   fair.      There 
were   six  extraneous   errors   created  and  no  errors  missed.      The  effective- 
ness  of  TESLA  in  these   situations  was : 


Effectiveness 


_  10  +  9-3A  +  2-1/2 
21 


21 


21  +  6 


=   .657 


5.3     ICL 

5-3.1     Introduction  to  ICL 

ICL,   Illiac   Control  Language,  was   designed  to  be  the  language 
for  controlling  the  job   description   for  execution  on  the  Burroughs  B65OO 
or  ILLIAC  IV.      The  compiler  was   originally  designed  using  a  recursive 
descent   compiler  building  system  with  no   automatic  error  recovery  features 


This  necessitated  a  complete  implementation  of  all  the  error  recovery  "by 
the  language  designers  as  they  built  the  compiler.   The  working  version  of 
the  ICL  compiler  was  modified  to  run  on  the  modified  Floyd  production 
translator  writing  system.   Most  of  the  work  was  involved  in  making  the 
semantic  action  calls  equivalent  as  only  minor  modifications  were  required 
to  the  syntax.   The  productions  that  had  "been  inserted  for  error  recovery 
were  removed  and  only  the  semantic  error  description  features  were  retained. 
A  description  of  ICL  can  be  found  in  Pavis  (77)  and  a  complete  syntax  for 
ICL  is  found  in  Appendix  k. 

5.3*2   ICL  Error  Recovery 

As  the  ILLIAC  IV  and  its  operating  system,  have  not  yet  been 
completed,  there  do  not  exist  any  ICL  programs  other  than  the  error-free 
ones  that  were  used  in  testing  the  ICL  system.   However,  with  three  solicited 
contributions  and  some  of  the  author's  tests,  some  indication  of  the  effec- 
tiveness of  the  error  recovery  can  be  made. 

The  error  recovery  was  not  as  effective  as  the  DEMALGOL  error 
recovery  for  several  reasons:   (l)  The  constructs  were  less  familiar  to 
the  programmers  than  those  of  DEMALGOL  and  hence  the  programmers  tended 
toward  more  complex  mistakes;   (2)  The  language  being  larger  had  greater 
complexity  allowing  the  error  recovery  to  produce  a  greater  variety  of 
possibilities  with  an  increased  likelihood  of  an  incorrect  match  fitting 
the  error  situation;   (3)  The  ICL  compiler  depends  much  more  on  the  use 
of  semantic  tests  which  in  error  situations  have  a  tendency  to  direct  the 
compiler  down  the  wrong  path  before  an  error  is  detected  so  that  when  the 
error  is  finally  detected  the  recovery  mechanism,  is  not  able  to  find  the 
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correct  path.   Too  late  to  be  included  in  this  thesis,  ICL  was  modified  to 
eliminate  completely  the  problem  with  semantic  tests.   This  is  expected  to 
improve  considerably  the  error  recovery  in  those  situations  influenced  by 
the  semantic  tests.  Whereas  the  DEMALGOL  compiler  had  to  reject  symbols 
in  only  one  case,  the  ICL  compiler  had  to  reject  symbols  in  seven  cases  in 
slightly  more  than  half  as  many  error  situations.  Nevertheless,  in  over 
half  of  the  situations,  the  recovery  was  perfect  and  in  most  of  the  rest  the 
recovery  although  incorrect  clearly  showed  the  programmer  what  his  errors 
were.   Six  errors  were  missed  by  the  recoveries  which  had  to  skip  symbols  to 
recover,  and  five  extra  errors  were  generated. 

The  following  summary  lists  the  errors,  the  recoveries,  and  the 
ratings  assigned  by  the  author: 

FILE  XF[1000]  ...  =  needed  after  XF  (2  times)  ... 

=  inserted  (E) 

PRINT  (<string>  )  ...  <string>  should  not  be  included  in 

parentheses  (5  times)  ...  (  deleted  and  later  )  deleted  (E,E) 
;  missing  after  <DECLAEATION>  or  <STATEMENT>  (k   times) 

. . .  ;  inserted  (E) 

SANE  AS  . . .  misspelled,  should  be  SAME  . . .  changed 

to  SAME  AS  (E) 

ILLIAC  PROGRAM  =  COMPILEONU  ...  =  extra  ...  deleted  =  (E) 

ALL  OF  TRY  . . .  BEGIN  missing  after  OF  . . .  BEGIN  inserted 

after  OF  (E) 
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•<-  used  as  assignment  arrow  instead  of  :  =  (5  times) 

...  :=  inserted  with  two  errors ,  one  to  insert  : 
the  second  to  replace  *-  with  =  (G) 

(Although  this  is  perfect,  because  of  the  distraction 
to  the  programmer  of  having  two  errors  for  one,  it  was 
given  a  G  rating. ) 
THEN  <STATEMENT>  ;  ELSE  ...  changed  ELSE  to  BEGIN  (G) 

THEN  <STATEMEMT>  missing  (2  times)  ...  changed  ELSE  to  THEN    (G) 
<STATEMENT>  missing  after  THEN  . . .  inserted  <identif ier> 
which  was  syntactically  correct  but,  because  the  null 
identifier  failed  all  semantic  tests,  a  parsing  path 
was  chosen  which  required  additional  errors  to  insert 
"  :  =  <identifier>  "  (G) 

COMPILE  GLYPNIR  (  ...  COMPILE  is  not  an  ICL  key  word 
...  inserted  :=  after  COMPILE  in  two  errors,  one 
for  each  symbol 
GLYPNIR  (  INI  ,  CI  )  ...  changed  ,  to  = 
PROGPAM  Y(  INPUT,  0UTPUT=X/Yj  =  BY  ;  ...  The  label 
equation  =X/Y  is  not  allowed  as  part  of  the 
parameter  list  ...  inserted  )  after  OUTPUT. 
This  caused  another  error  at  the  later  )  in  which 
the  recovery  skipped  to  the  ;  (G) 

,  left  out  of  <identif ier>  list  in  a  declaration  . . . 
skipped  to  the  next  ,  in  the  parameter  list  of  the 
next  identifier.   This  caused  the  remaining 
parameters  to  be  declared  as  separate  identifiers 


71 

and  created  one  additional  extraneous  error  (g) 

since  the  recovery  communicated  well  what  the  error 

actually  was. 

label  equation  used  in  an  <EXECUTION  STEP>  (k   times) 

...  label  equations  are  allowed  only  in  declarations, 

not  in  executions  . . .  skipped  to  symbol  following  the 

execution  step:  THEN  (twice),  ELSE,  and  END.   This  caused 

it  to  miss  three  errors  and  create  one  extraneous  error.    (F) 

B6500  FILES  A,  B,  C;  ...  FILES  should  be  singular  ... 

skipped  to  ;  missing  three  errors  (each  of  the 

identifiers  was  to  have  a  label  equation)  (F) 

•  missing  after  an  <EXECUTION  STATEMENT>  (which  is  a 

single  program  name  with  parameters)  ...  inserted  the 

word  WITH  and  caused  two  extraneous  errors.   Because 

this  was  more  confusing  to  the  programmer,  it  was  rated  (P). 

This  summary  indicates  19  recoveries  rated  E,  12  rated  G,  five 

1  rated  F,  and  one  rated  P.   There  were  six  errors  missed  and  five  extra 

errors  created  so  the  effectiveness  of  the  error  recovery  was: 

J 

19  +  12.|+  5.|+  l.J     37 

Effectiveness  = 37  +  g  ' " —    3?  +  E=  =  -630 

1  If  the  errors  regarding  using  <-  instead  of  :  =  are  rated  E,  then  this 
measure  changes  to  .655* 
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5.3.3  Error  Recovery  in  the  Recursive  Descent  ICL  Compiler 

The  same  programs  described  in  the  previous  section  were  run  on 
the  original  recursive  descent  ICL  compiler.   This  section  discusses  the 
results  of  these  runs  which  were  much  less  satisfactory  than  the  ones  just 

described. 

Any  situation  which  the  language  designer  does  not  anticipate 
causes  the  compiler  to  hack  up  to  the  top  and  fail.  This  top  failure  is 
not  checked  and  hence  the  compiler  does  not  include  this  in  its  error  count. 
In  three  of  the  four  programs,  the  first  error  the  compiler  encountered  was 
an  unforeseen  one,  so  the  compiler  terminated  after  two  or  three  cards  say- 
ing that  there  were  no  errors.   The  compiler  required  five  runs  to  get  to 
the  end  of  one  program.   As  a  result  this  compiler  missed  a  lot  of  errors, 
including  some  that  were  missed  several  times.   Due  to  the  fact  that  some 
things  had  to  he  fixed  to  get  the  compiler  to  run  to  conclusion,  the  set 
of  errors  listed  below  is  somewhat  different  from  the  set  described  for  the  I 
previous  ICL  compiler.   The  following  are  the  errors  found,  the  explanations  ; 
of  the  errors,  and  the  recoveries: 

,ALGSE,1,  in  parameter  list  ...  should  have  been  ,ALGSEM1, 

. . .  "impermissible  parameter"  W 

SANE  AS  . . .  "syntax  error  in  file  map"  (G) 

label  equation  in  parameter  list  (5  times)  ... 

"impermissible  parameter"  v  ; 

<-   (5  times)  ...  should  have  been  :=  ...  "statement 

(f) 

not  recognizable"  v 

COMPILE  GLYPNIR  ...  "statement  not  recognizable"  (F) 

B5500  FILES  ...  "statement  not  recognizable  W 
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(F) 

(F) 

(F) 

(P) 

(P) 

(P) 

(P) 

(P) 

ILLIAC  PROGRAM  =  ...  "statement  not  recognizable" 

no  BEGIN  after  ALL  OF  . . .  "statement  not  recognizable" 

, OUTPUT  =  X/Y)  =  BX  ...  "impermissible  parameter" 

and  "label  equation  required" 
PRINT(  "string")  {h   times)  ...  "missing  string" 
;  missing  (2  times)  ...  "statement  not  recognizable" 
THEN  missing  ...  "statement  not  recognizable" 
XF  [1000]  ...  "missing  file  map  specification" 
This  is  a  total  of  26  errors.   The  recoveries  were  rated  as  follows:   one 
excellent,  one  good,  15  fair,  and  nine  poor.   There  were  98  errors  missed, 
including  those  missed  several  times,  but  there  were  no  extra  errors  created. 
This  gives  an  effectiveness  rating  of  .093  "by  the  effectiveness  formula: 

3  ,  ,  15  ■  9      ^ 
1  +  j-.l  +  —  +  r-   26 

Effectiveness  =  ^  +  gf .  ^  =  .093 

This  compiler  was  rarely  able  to  do  more  than  tell  the  programmer 
I  that  an  error  occurred  and  it  was  unable  to  recover  so  many  times  that  it 
|missed  many  errors,  causing  the  programmer  to  make  additional  runs.   Hence 

this  compiler  was  very  ineffective  in  its  error  recovery  as  reflected  by 

its  effectiveness  value  of  .093- 


5.1+  0SL 

p.^t-.l  Introduction  to  OSL 

0SL  was  designed  by  Peter  Alsberg  (2)  as  part  of  his  Ph.D.  research. 
\s  an  operating  system  language,  it  is  functionally  similar  to  ICL.   It  is 
ouch  more  complex  than  ICL  however;   in  fact  it  is  the  most  complex  and  the 
largest  language  implemented  on  the  translator  writing  system.   This  fact 
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itself  led  to  some  of  the  conclusions  reported  later.   A  complete  descrip- 
tion of  OSL  is  available  in  Alsberg  (2).   Because  of  the  size  of  the  defini- 
tion of  OSL,  the  syntax  is  not  included  in  this  document.   Hence  appropriate 
comments  will  be  made  along  with  the  discussion  of  each  error  in  order  to 
clarify  the  syntactic  error  that  was  made  in  each  case. 

5.^.2  OSL  Error  Recovery 

As  OSL  was  being  developed,  its  creator  ran  several  test  pro- 
grams to  debug  his  work  and  found  the  error  recovery  features  implemented 
quite  satisfactory.   The  only  drawback  was  the  significantly  long  time  the 
compiler  took  to  make  the  recoveries.   It  was  this  fact  which  led  to  the 
use  of  two  symbol  strings  instead  of  three  symbol  strings.   Also  other 
changes  were  made  to  the  string  generation  procedure  to  reduce  the  time 
involved.   Unfortunately,  all  of  the  creator's  tests  were  discarded.   Hence 
no  set  of  programs  with  errors  was  available  for  this  thesis  and  all  of 
those  discussed  below  are  some  the  author  ran  himself. 

In  the  following  listing  of  the  errors  found  by  the  OSL  compiler, 
the  error  is  given,  followed  by  the  appropriate  syntax  constructs  and  the 
recovery  that  was  used. 

;    missing  (k   times)  ...  same  use  as  in  ALGOL  ... 

;  inserted  (E) 

ESAC  missing  ...  CASE  <INTEGER-EXPRESSION>  OF  list  of 
<PATTERN  EXPRESSIONS  separated  by  commas 
ESAC  . . .  ESAC  inserted  (E) 

FI  missing  ...  IF  <BOOLEAN  EXPRESSION  THEN  <STATEMENT 

LIST>  [ELSE  <STATEMENT  LIST>]  ?  FI  ...  FI  inserted         (E) 
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END  missing  ...  same  use  as  in  ALGOL  ...  END  inserted         (E) 

operator  missing  in  K  <-  J  +  6l  ;  . . .  *  (multiply) 

inserted  (E) 

,  used  as  separator  in  a  <PATTERN  EXPRESSION  where  & 

was  the  required  separator  ...  ,  changed  to  &  (E) 

(  and  )  missing  from  SPAN  construct  in  <PATTERN  EXPRESSION 

. . .  SPAN  (<STRING  EXPRESSION  [  ,  <INTEGER  EXPRESSION]  ?  ) 
...  inserted  both  (  and  )  in  correct  positions         (E,E) 

FOR  J  <-  1  STEP  1  TO  K  DO  ...  where  <FOR  EXPRESSION  :  :  = 
<INTEGER  EXPRESSION  /  VALUE  (  <INTEGER  EXPRESSION  ) 
a  <FOR  LIST  ELEMENT>  can  be  <FOR  EXPRESSION  [  TO  <FOR 
EXPRESSION  [  BY  <FOR  EXPRESSION]?  /  STEP  <FOR  EXPRESSION 
UNTIL  <BOOLEAN  EXPRESSION]  the  correct  symbols  here  should 
be  1  TO  K  BY  1  or  1  STEP  1  UNTIL  K  <  J  ...  TO  was 
changed  to  UNTIL  and  later  =  <number>  was  inserted 
after  K  (E,G) 

6  x  23  •••  * ,   not  x,  is  the  multiply  operator  ... 

changed  x  to  +  (G) 

)  missing  in  a  structure  declaration  ...  ) 

inserted  but  not  at  the  place  intended  (G) 

extra  )  in  structure  declaration  . . .  last  )  deleted  whereas 

the  next  to  last  was  really  the  extra  (G) 

initial  BEGIN  missing  ...  an  OSL  program  can  be  a 
<BLOCN  a  <COMPOUND  STATEMENT>  or  a  <PROCEDURE 
DECLARATION  . . .  inserted  PROCEDURE  after  the  first 
declaration  type,  namely  INTEGER.   The  first 
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declaration  was  INTEGER  I  «-  0,  J  *-  1,  K  *-  0; 
The  «-  0,  after  the  I  were  replaced  with  j  . 
It  is  difficult  to  tell  how  the  error  recovery 
would  have  handled  the  rest  of  the  program  because, 
from  the  J  on,  the  symbols  the  parser  thought 
were  syntactically  correct  were  rejected  by  the 
semantic  tests  and  consequently  the  sequence 
of  errors  in  the  program  became  a  very  confusing 
situation.   (Rated  F  with  1  extra  and  forget  the 
rest  since  handling  the  semantics  differently 
could  have  made  a  big  difference. ) 
There  were  17  errors  detected.   The  recoveries  were  rated  as  follows: 
12  excellent,  four  good,  one  fair,  no  errors  missed  but  two  extra 
errors  created  so  for  this  particular  set  of  examples  the  OSL  error  re- 
covery has  an  effectiveness  of 

12  +  k.\  +  l.|    17 

Effectiveness  =  =■= .  tr ■  •  =  . 8l6 

17         17+  2 

Since  this  is  such  a  small  set  of  errors  and  since  some  of  them  were 
intentional,  these  results  cannot  be  considered  typical.   However,  it  is 
not  clear  whether  the  effectiveness  would  be  better  or  worse  for  this  com- 
piler if  it  were  used  more  extensively. 

5-5   Summary  of  the  Results  from  DEMALGOL,  TESLA,  ICL,  and  OSL 

Table  3  summarizes  the  results  from  DEMALGOL,  TESLA,  ICL,  and 
OSL.   It  seems  clear  that  the  compilers  constructed  using  this  system  do  a 
very  good  job  in  describing  the  programmer' s  error  in  such  a  way  as  to 
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minimize  his  effort  in  finding  his  syntactic  errors.  When  compared  with 
other  compilers  for  the  same  language,  the  compilers  with  the  automatic 
error  recovery  system  were  far  superior  to  the  compilers  with  handmade 
error  recovery. 
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FPL  compiler  with 
automatic  error  rec. 

Othei 
same 

*  compilers  for 
language 

DEMALGOL 

ALCOR 

Burroughs 

Excellent 

Good 

Fair 

51 
8 
0 

.   27 

15 

7 

1 

12(0) 
5(0) 
1(18) 
6 

Poor 

0 

Missed 
Extra 

0 

5(1) 

6 
5^(38) 

21 
0 

Effectiveness 

.891(.95) 

.k2l{.k9l)     .39^-23: 

TESLA 

Excellent 
Good 
Fair 
Poor 

Missed 
Extra 

Effectiveness 

10 

9 
2 
0 

0 

6 
.657 

ICL 

Excellent 
Good 

19 

12(7) 

1 

1 

15 

9 

Fair 

5 

Poor 

1 

Missed 
Extra 

6 
5 

98 
•  .   0 

Effectiveness 

•630(.655) 

.093 

OSL 

Excellent 
Good 
Fair 
Poor 

Missed 
Extra 

Effectiveness 

12 

k 

1 

0 

0 
2 

.816 

Table  3.   Comparison  of  DEMALGOL,  TESLA,  ICL,  and  OSL 
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CHAPTER  6.   LANGUAGE  DESIGN  CONSIDERATIONS  FOR  IMPROVED  ERROR  RECOVERY 

6 . 1  Introduction 

The  system  of  error  recovery  described  in  this  thesis  is  automatic, 
and  therefore  the  language  designer  does  not  have  to  make  a  special  effort 
to  design  the  error  recovery  for  his  compiler.  However,  there  are  ways 
he  can  improve  the  error  recovery  in  his  compiler  by  the  way  he  defines  his 
language.  This  is  similar  to  the  situation  in  which  a  programmer  can  write 
a  more  efficient  high-level  language  program  by  taking  cognizance  of  the 
functioning  of  the  compiler.  The  purpose  of  this  chapter  is  to  indicate  how 
some  of  the  types  of  language  structures  affect  the  operation  of  the  error 
recovery  of  the  compiler.   Several  situations  to  be  avoided  as  well  as 
several  to  be  sought  will  be  described. 

6.2  Identifiers  and  Semantic  Tests 

This  section  discusses  the  effect  upon  the  error  recovery  of  the 
use  of  semantic  tests  to  determine  the  validity  of  the  presence  of  a  parti- 
cular identifier  at  that  point  in  the  parse. 

In  the  examples  discussed  previously,  the  error  situations  with 
the  poorest  recovery  were  usually  those  involving  identifiers  whose  legality 
was  determined  on  the  basis  of  previously  declared  attributes  of  those 
identifiers.   Because  any  semantic  action  call  may  have  side  effects,  no 
semantic  tests  are  called  during  error  recovery.   This  means  that  the  source 
is  changed  on  the  basis  of  purely  syntactic  considerations  and  in  all  the 
previous  languages  all  identifiers  are  alike  with  regard  to  the  syntactic 
structure.   The  result  is  that  sometimes  the  recovery  produces  something 
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hich  is  syntactically  correct  but  which  fails  when  the  semantic  tests  are 
applied.   Furthermore,  because  of  an  earlier  semantic  mistake  a  crucial 
test  can  fail  and  send  the  parser  down  as  incorrect  branch  before  a  syn- 
tactic mistake  is  found. 

The  problem  is  two-fold.   First  the  parser  as  currently  implemented 
ilies  completely  upon  semantic  tests,  when  they  are  present,  to  resolve  any 
flicts  in  the  Floyd  productions,  ignoring  any  syntactic  context.   Second- 
ly the  parser  is  blind  to  semantic  differences  between  identifiers,  treating 
them  all  as  equals. 

The  first  part  of  the  problem  could  be  easily  overcome  by  modify- 
ing the  BNF  to  FPL  conversion  so  that  it  ignores  semantic  tests  except  as 
a  last  resort.   That  is,  all  conflicts  between  Floyd  productions  would  be 
resolved  first  by  either  lookahead  analysis  or  by  combination  and  only  if   ; 
both  of  these  techniques  fail  to  resolve  the  conflict  would  the  presence,  of 
a  semantic  test  be  assumed  to  resolve  it.  Ij 

The  second  part  of  the  problem  could  be  resolved  by  allowing  the 
scanner  to  produce  different  kinds  of  identifiers  so  that  they  are  not  all 
alike  to  the  parser.   Since  the  system  currently  implemented  has  provision 
for  twenty  different  class  symbols  distinguishable  by  the  parser  (the  first 
three  of  which  are  identifier,  number,  and  string),  these  other  class  symbol; 
could  be  used  to  represent  identifiers  with  different  declared  attributes. 
For  example,  consider  the  following  in  the  case  of  a  language  having  several 
different  kinds  of  identifiers.   In  the  declaration,  after  recognizing  the 
type  of  identifier  being  declared,  a  global  variable  would  be  set  to  contain 
the  code  for  this  class  of  identifiers.   One  would  be  undeclared  identifiers 
;ers,  5  Boolean  identifiers,  6  arrays,  and  so  forth.  When  an  entry 
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is  made  to  the  symbol  table,  the  class  number  would  be  stored  also.  When  an 
identifier  is  looked  up  in  the  symbol  table,  its  class  number  would  be 
obtained  and  used  to  report  to  the  parser  the  class  symbol  which  was  recog- 
nized.  If  the  class  number  were  0,  the  undeclared  identifier  class  number 
would  be  used.   This  would  eliminate  the  need  to  make  the  parser  depend  on 
semantic  tests  to  distinguish  between  different  classes  of  identifiers  and 
would  also  allow  the  error  recovery  to  take  this  information  into  account 
since  the  problem  of  side  effects  from  semantic  routines  is  eliminated. 
The  parsing  would  be  less  apt  to  take  a  wrong  path  and  the  error  recovery 
would  be  likely  to  be  more  precise. 

A  further  benefit  from  this  approach  is  the  possibility  of  lift- 
ing the  restriction  in  the  string  generator  procedure  to  ignore  patterns 
which  have  a  match  with  the  input  string  only  at  an  identifier.   The  reason 
for  this  restriction  is  the  high  frequency  of  identifiers  in  most  languages. 
This  high  frequency  suggests  that  a  pattern  which  matches  only  at  an  identi- 
fier is  more  likely  to  be  a  wrong  pattern  than  a  correct  one.   However  if 
each  type  of  identifier  is  a  different  syntactic  quantity,  the  class  assoc- 
iated with  each  type  will  occur  less  often  than  the  single  identifier  class 
by  itself.   This  means  that  the  restriction  mentioned  above  could  be  lifted 
and  all  strings  which  have  any  match  with  the  input  string  could  be  consid- 
ered in  the  recovery.   By  being  able  to  take  advantage  of  more  information, 
the  recovery  should  be  able  to  perform  better,  even  above  that  improvement 
would  would  result  from  the  removal  of  the  semantic  tests. 

The  recent  change  to  the  ICL  compiler  that  was  mentioned  in 
Chapter  5  was  that  which  has  been  discussed  in  the  above  two  paragraphs. 
The  different  identifier  classes  which  were  formerly  distinguished  by  semantic 
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ave  been  replaced  with  different  syntactic  classes.   The  restriction 
3.1ass  symbols  in  the  string  generator  will  be  lifted.   The  effect 
of  this  on  the  error  recovery  is  yet  to  be  tested. 

ientifier  classes  which  are  syntactically  identical,  differ- 
only  by  semantic  tests ,    are  a  problem  to  the  error  recovery,  but 
;ion  has  given  two  suggestions  which  could  completely  eliminate  the 
problem  from  most  languages. 

6.3  Errors  Which  Resemble  Valid  Constructs 

Another  type  of  problem  which  is  illustrated  by  one  construct  in 
SLA  is  that  of  having  a  definition  which  resembles  a  common  error  situatioi 
s,  certain  likely  errors  appear  to  be  the  initial  part  of  some  valid 
instruct,  causing  the  parser  to  go  one  or  more  symbols  down  the  wrong  path 
;he  error  is  detected.   This  section  deals  with  this  kind  of  error 
-  at ion. 

The  particular  case  in  TESLA  is  the  <input  assignment>  list  with 
pattern  specif ication>  lists  as  its  elements.   Both  lists  are  separatee 
The  <data  pattern  specification>  list  begins  with  an  identifier 
.  by  possibly  one  "[number  :  number]"  followed  by  either  an  identifiei 
cbit  pattern>.   Each  succeeding  <data  pattern  specification 
number] "  followed  by  either  an  identifier  or  a  literal 
ern>.   A  common  error  was  interchanging  the  "[number  :  number]" 
dentifier  in  front  of  it.   This  made  that  identifier  look  like 
hich  really  can  follow  the  "[number  :  number]".   Hence  after  the 
the  missing  identifier  error  in  which  an  identifier  was  in- 
;  another  error  when  the  literal  <bit  pattern>  came  up. 
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Because  the  identifier  looked  legitimate,  the  parser  went  too  far,  finishing 
the  <data  pattern  specification>  prematurely  before  it  found  the  error. 
Hence  when  it  did  find  the  error,  the  -recovery  had  to  "be  more  complicated 
and  less  clear  to  the  programmer  since  it  had  gone  past  the  actual  circum- 
stance of  the  error.   The  recovery  could  only  describe  the  error  in  terms 
of  the  position  in  the  parse  in  which  the  error  was  finally  detected.   This 
happened  to  be  the  beginning  of  either  a  <data  pattern  specification>  or  an 
<input  assignments  list,  rather  than  the  last  part  of  one. 

In  general  the  problem  that  is  illustrated  by  the  TESLA  example 
is  that  when  an  error  resembles  some  other  definition  which  is  valid  at 
that  point,  the  parser  follows  the  other  definition  until  things  no  longer 
match.   Then  the  error  looks  completely  different  from  the  actual  error  and 
the  description  given  the  programmer  is  probably  either  not  helpful  (fair) 
or  misleading  (poor). 

Although  error  situations  which  initially  resemble  valid  constructs 
cannot  always  be  anticipated  when  the  language  is  being  designed,  the  more 
the  definitions  are  distinctly  different,  the  better  will  be  the  error  re- 
covery, since  it  will  be  less  likely  that  an  error  will  look  like  some  other 
construct. 

6.k     Representing  Error  Constructs  in  the  Syntax 

Sometimes  an  error  situation  which  would  look  like  some  other 
construct  could  be  represented  specifically  in  the  syntax.   It  could  then 
either  be  allowed  as  part  of  the  language  or  described  as  an  error  by  a 
semantic  routine.   Since  it  has  been  added  to  the  syntax,  it  will  not  be 
flagged  as  a  syntactic  error. 
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One  example  of  this  from  DEMALGOL  is  the  production  <TYPE>  :  :  = 
<identifier>.   he  problem  for  which  this  production  was  a  solution  was  that 
the  use  of  a  non-DEMALGOL  declaration  type  appeared  to  the  DEMALGOL  compiler 
as  an  identifier.   An  identifier  following  a  semicolon  or  BEGIN  looked  like 
the  "beginning  of  an  assignment  statement.  When  the  compiler  came  to  the 
first  identifier  that  was  declared  to  have  that  type,  an  error  in  an  assign- 
ment statement  was  detected  and  an  assignment  arrow  was  inserted.   The 
ensuing  commas  all  had  to  be  changed  to  operators.   Furthermore  the  next 
declaration  was  in  error  since  all  the  declarations  had  to  come  at  the  be- 
ginning of  the  block,  before  any  statements.   This  error  was  easily  fixed 
with  the  insertion  of  a  BEGIN;  but  that  necessitated  an  additional  END  at 
the  end  of  the  program.   Adding  the  above  production  with  a  semantic  routine 
which  stated  that  the  declaration  type  was  not  valid  solved  all  of  these 
problems  very  neatly.   However  that  change  now  makes  an  assignment  statement  : 
with  a  missing  assignment  arrow  look  like  a  declaration.   Of  the  two  choices,: 
the  latter  seems  to  be  a  less  frequent  mistake  than  the  former  and  is  there- ! 
fore  preferable. 

Another  example  of  an  error  which  could  be  allowed  as  valid  is 
that  illustrated  by  the  semicolon  in  the  following  ALGOL  construct:  IF 
<Boolean  expression>  THEN  <statement>;  ELSE  <statement>.   Since  the  semi- 
colon is  valid  if  the  statement  is  a  partial  conditional,  the  parser  conclude 
the  partial  conditional  statement  and  finds  an  error  at  the  beginning  of  a 
statement  when  it  looks  at  the  ELSE.   The  ALCOR  compiler  checked  for  a  semi- 
colon preceeding  an  ELSE.   The  ELSE  in  GLYPNIR,  see  Lawrie  (66),   was  re- 
placed with  [ELSE  /  ;  ELSE]  making  the  semicolon  syntactically  valid. 
Although  there  is  no  reason  not  to  allow  the  semicolon  to  be  a  valid  constru" 
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in  the  language  when  it  has  been  added  to  the  syntax,  a  semantic  routine 

could  print  an  error  message.   OSL  and  ALGOL  68,  see  van  Wijngaarden,  et  al 
(97)>   avoid  this  error  by  allowing  statement  lists  following  the  THEN  and 
the  ELSE  hut  requiring  an  easily  forgotten  keyword  FI  at  the  end  of  the 
conditional  statement.   The  FI  enables  the  parser  to  be  able  to  conclude 
the  conditional  statement.   A  forgotten  FI  could  allow  a  number  of  state- 
ments to  pass  before  a  situation  is  reached  where  the  FI  is  required  for 
the  parser  to  continue. 

Again  it  is  not  possible  to  conceive  of  all  the  potential  error 
situations  at  the  time  the  language  is  designed  in  order  to  be  able  to 
anticipate  some  of  them  through  additions  to  the  syntax,  but  any  which  are 
discovered  or  anticipated  and  can  be  handled  precisely  with  such  additions 
will  make  the  error  recovery  that  much  better  in  communicating  to  the 
programmer. 

6.5  Problem  of  Segmented  Language  Structure 

The  problem  dealt  with  this  section  is  one  which  arises  in  con- 
structs like  an  ALGOL  block  in  which  all  declarations  must  precede  all 
statements. 

The  problem  of  declarations  coming  after  statements  or  an  error 
in  a  declaration  making  it  look  like  a  statement  and  therefore  throwing  off 
all  the  remaining  declarations  has  occasionally  been  helped  by  allowing 
statements  and  declarations  to  be  intermixed  syntactically.   This  is  done 
by  ICL  and  also  by  the  GLYPNIR  compiler. 
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6.6     Redundancy 

The  example  in  Section  6.k   with  the  ";ELSE"  illustrates  another 
step  that  can  be  made  in  the  language  design  to  acilitate  the  error  des- 
cription and  recovery.   That  step  is  the  matter  of  allowing  some  redundancy 
in  the  language.   This  actually  amounts  to  allowing  some  situations  to  be 
accepted  as  valid  which  would  otherwise  be  errors.   Sometimes  it  is  quite 
easy  to  allow  several  different  ways  of  saying  the  same  thing,  including 
several  of  the  more  common  ways  that  a  given  construct  might  be  written  in 
error.  When  a  pair  of  symbols  or  constructs  might  easily  be  interchanged, 
thought  should  be  given  as  to  whether  there  is  any  reason  why  both  alter- 
natives might  not  be  permissible.   If  this  could  be  done  in  TESLA,  it  would 
help  the  TESLA  error  recovery.   The  use  of  redundancy  in  places  which  are 
easily  written  incorrectly  would  be  a  big  help  to  the  programmer. 

6-7  Noise  Symbols 

Another  type  of  redundancy  which  could  be  added  is  the  presence 
of  noise  symbols,  i.e.,  either  symbols  which  are  currently  in  the  language 
but  not  actually  needed  for  parsing  or  symbols  which  are  not  in  the  language 
but  are  apt  to  be  inserted  by  a  programmer.   In  either  case,  permitting  both 
the  presence  and  absence  of  these  symbols  in  the  language  would  probably 
allow  a  few  situations  which  are  errors  to  become  valid,  and  therefore  the 
programmer  would  have  less  to  do  in  order  to  write  an  acceptable  program. 

In  ALGOL,  the  ":  "  of  ":-"  in  an  assignment  statement  is  noise  but 
the  same  thing  as  an  assignment  in  a  primary  as  part  of  a  Boolean  expression 
is  not  noise.   Allowing  "="  in  addition  to  ":="  in  the  places  where  there 
would  be  no  ambiguity  would  eliminate  some  errors.   Also  in  ALGOL,  the  TO 
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of  GO  TO  is  noise  and  is  not  required.   In  ICL,  OF  in  ALL  OF  BEGIN  is  a 
noise  word  and  is  not  required. 

However,  all  of  these  examples  cause  no  problems  to  the  error  re- 
covery.  In  each  case,  the  compiler  would  be  able  to  identify  exactly  what 
was  missing  and  tell  the  programmer  what  symbol  or  symbols  he  must  put  in. 
In  fact  it  is  the  characteristic  that  makes  symbols  noise  symbols  that 
allows  the  parser  to  be  able  to  tell  exactly  what  is  missing,  namely  that 
the  parse  is  completely  determined  without  those  symbols.   Therefore  when 
the  parser  gets  to  them  and  does  not  find  them,  it  is  able  to  tell  exactly 
what  is  missing  since  there  is  only  one  path  valid  at  that  point.   Also 
since  the  parse  is  completely  determined  without  them,  they  can  be  left  out 
of  the  definition  of  the  language  without  introducing  any  ambiguity  in  the 
resulting  parser. 

Another  aspect  to  required  noise  symbols  is  that  they  can  sometimes 
help  in  the  recovery  from  other  errors.   For  example,  if  a  crucial  symbol 
which  distinguishes  several  possible  alternative  branches  is  in  error  (left 
out,  misspelled,  etc. )  but  on  each  branch  is  another  noise  symbol  which  is 
not  in  error,  then  the  error  recovery  mechanism  will  be  able  to  tell  from 
the  noise  symbol  which  path  was  intended.   Similarly,  if  a  particular  lan- 
guage allows  optional  noise  symbols  throughout  the  program,  it  is  probably 
to  the  programmer's  advantage  to  make  a  habit  of  using  them  since  for  the 
same  reason  his  errors  might  be  more  clearly  indicated  to  him  if  he  left 
them  out. 

The  use  of  noise  symbols  is  an  example  of  a  redundancy  which  can 
improve  the  general  error  recovery. 
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6.8  Delineators 

A  type  of  redundancy  which  is  useful  in  some  error  situations  is 
the  delineator  or  separator  between  constructs  of  the  language.   This  is 
illustrated  by  the  semicolon  in  ALGOL.   The  semicolon  between  statements 
and  declarations  is  not  needed  for  parsing  but  serves  to  delineate  the 
statements  and  declarations.  When  the  parser  takes  the  wrong  path  first 
and  is  not  able  to  find  an  appropriate  correction  or  the  program  contains 
a  large  set  of  extraneous  symbols  so  that  nothing  the  parser  tries  will  fix 
the  bad  string,  the  parser  must  skip  symbols  to  find  one  which  follows  one  o: 
the  symbols  that  it  is  seeking.   It  is  with  this  type  of  error  that  it  is 
important  to  have  effective  delineation  symbols  in  the  program.   In  an 
ALGOL-like  language,  there  is  usually  a  statement  or  declaration  marker  in 
the  marker  stack  and  hence  the  symbol- skipping  will  usually  stop  at  a  semi- 
colon and  finish  off  either  the  statement  or  the  declaration.   This  gets-  the 
parser  back  on  the  track  reasonably  effectively,  missing  only  those  errors 
which  remained  in  that  statement  or  declaration.  Without  the  semicolon,  an 
identifier  could  follow  a  statement  as  either  a  label  or  the  left  hand  side 
of  an  assignment  statement  and  therefore  the  skipping  would  stop  at  the  next 
identifier,  the  error  recovery  assuming  that  it  had  come  to  the  end  of  the 
current  statement.   Clearly,  the  use  of  an  identifier  as  a  delineator  will 
frequently  be  erroneous  and  the  parser  will  find  another  serious  error 
immediately.   The  recovery  will  be  poor,  causing  the  programmer  much  frus- 
tration in  identifying  the  legitimate  errors.   Having  delineators  such  as 
semicolons  scattered  throughout  the  program  will  help  significantly  in  the 
recovery  in  the  case  of  errors  which  are  such  that  the  only  recovery  avail- 
able is  to  skip  symbols  until  a  suitable  following  symbol  is  identified. 
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Since  identifiers  usually  occur  in  many  different  constructs,  it 
is  usually  a  good  plan  to  avoid  situations  in  which  they  would  become  follow- 
ing symbols.   It  is  important  to  know  how  the  compiler  building  system 
operates  in  order  to  know  what  will  become  following  symbols  in  the  language. 
In  the  translator  writing  system  described  in  Chapter  2 ,   there  is  a  marker 
for  every  nonterminal  symbol  in  a  nonfirst  position  on  the  right  hand  side 
of  the  BNF  productions.   If  the  nonterminal  symbol  is  followed  by  a  terminal 
symbol,  then  that  terminal  symbol  is  a  unique  following  symbol.   If  it  is 
followed  by  a  nonterminal  symbol,  then  all  the  first  terminal  symbols  on  all 
the  derivations  for  that  nonterminal  are  the  following  symbols.   If  this 

■  nonterminal  is  at  the  end  of  the  right  hand  side  of  the  BNF  production,  then 
the  following  symbols  are  all  the  symbols  which  can  follow  the  left  hand  side 
nonterminal  wherever  it  occurs  in  the  syntax.   By  keeping  this  in  mind,  it 

is  possible  that  the  productions  or  definitions  can  be  chosen  in  such  a  way 
i  as  to  improve  the  sets  of  following  symbols,  in  particular  so  that  right 

parentheses  and  brackets,  semicolons,  commas,  key  words,  etc.,  are  follow- 
I  ing  symbols  for  as  global  markers  as  possible. 

This  section  has  shown  the  desirability  of  having  delineators  in 

the  syntax  and  the  language  specification  considerations  necessary  to  make 

■  the  delineators  effective  in  error  recovery. 

■ 

6.9  Ordering  of  Alternatives 

The  final  consideration  to  be  mentioned  is  the  order  of  the  de- 
finitions. Because  of  the  parsing  algorithm  used  in  the  examples  in  this 
research,  the  order  of  the  definitions  does  not  affect  the  parsing  except 
perhaps  in  the  speed  of  the  compiler.   In  other  parsing  algorithms,  the 
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order  may  be  fixed  in  order  to  parse  correctly.  However,  where  possible, 
the  ordering  can  he  changed  to  improve  the  error  recovery  in  some  situations. 
Since  the  error  recovery  changes  the  source  string  according  to  the  first 
occurrence  of  the  first  pattern  match  it  finds,  the  order  of  the  definitions 
will  affect  which  symbols  are  put  in.  The  error  recovery  will  always  search 
the  table  of  generated  strings  for  any  occurrence  of  the  first  pattern 
before  trying  the  second  and  then  any  occurrence  of  the  second  before  trying 
the  third,  and  so  forth.   The  search  stops  with  the  first  pattern  to  match 
a  generated  string,  and  that  string  is  used  to  correct  the  program.   There- 
fore whenever  there  are  parallel  constructs  (which  would  create  the  same 
patterns  in  most  cases),  the  most  desirable  choice  should  be  the  first 
definition.   Since  the  most  desirable  choice  for  error  correction  or  des- 
cription is  also  probably  the  most  frequently  occurring  construct,  the  order j 
that  is  best  for  error  description  is  also  the  order  that  is  best  for  pars-  : 
ing  speed.   For  example,  "+ "  and  "-"  are  usually  in  parallel  constructs 
and  since  "+ "  is  the  more  frequently  used  of  the  two  (62),  it  should  be  the  - 
first  definition.   Similarly,  the  most  common  declaration  type  should  be 
the  first  definition  of  declaration  type.   Other  orderings  may  be  more 
subtle.   For  example,  it  may  be  possible  to  cause  the  error  recovery  to 
insert  a  multiplication  operator  rather  than  an  addition  operator  where  an 
operator  was  left  out  of  an  expression  by  changing  the  order  of  two  pro- 
ductions which  would  not  at  first  seem  to  be  parallel.   This  may  have  been 
the  case  with  DEMALGOL  although  this  has  not  been  verified. 

In  short,  the  definitions  should  be  ordered  in  such  a  way  as  to 
have  the  most  frequently  occurring  constructs  first.   This  will  not  only 
improve  parsing  speed  but  will  also  cause  the  error  recovery  to  insert  the 
more  common  of  two  or  more  symbols  defined  in  parallel. 
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6 . 10  Conclusion 

This  chapter  has  given  various  suggestions  which  a  language 
designer  can  use  to  improve  the  error  recovery  in  a  compiler  implemented  on 
the  system  described  in  this  thesis.   Some  of  these  suggestions  pertain  only 
to  the  language  design  and  are  independent  of  the  system  on  which  the  com- 
piler is  implemented.   The  considerations  discussed  have  included:   (l) 
avoidance  of  semantic  tests,  (2)  use  of  different  identifier  classes,  (3) 
avoidance  of  syntactic  constructs  which  resemble  common  errors,  (h)   repres- 
entation of  some  errors  in  the  syntax  and  then  either  printing  an  error 
message  in  a  semantic  routine  or  allowing  the  construct  as  valid,  (5)  an 
alternative  to  separating  syntactic  constructs  into  sequential  groups,  (6) 
use  of  redundancy,  (7)  use  of  noise  symbols,  (8)  use  of  delineators,  and 
(9)  ordering  of  alternative  definitions.   The  application  of  these  concepts 
to  the  design  of  a  language  with  its  compiler  constructed  by  the  system  out- 
lined in  this  thesis  would  lead  to  a  compiler  with  very  effective  syntactic 
error  recovery.   This  compiler  would  be  very  easy  for  a  programmer  to  use 
in  the  sense  that  he  would  have  very  little  trouble  getting  his  programs  to 
be  syntactically  correct. 
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CHAPTER  7.    EXTENSION  TO  RECURSIVE  DESCENT  PARSING 

7-1  Additional  Considerations  Needed  for  Recursive  Descent 

Error  detection  is  inherent  in  the  modified  Floyd  production 
language^  parsing  algorithm  but  not  in  the  recursive  descent  parsing  al- 
gorithm.  This  is  the  primary  problem  in  applying  the  method  of  automatic 
error  recovery  presented  in  this  thesis  to  recursive  descent  parsing.   It 
is  primarily  with  the  problem  of  error  detection  in  recursive  descent 
parsing  that  this  chapter  is  concerned. 

If,  in  parsing  a  program  according  to  a  group  of  Floyd  productions, 
no  Floyd  production  applied,  then  an  error  was  detected.   In  recursive 
descent  parsing  algorithms,  the  fact  that  a  given  production  did  not  apply  ; 
does  not  imply  an  error.   It  could  be  just  that  the  wrong  branch  was  tried 
and  that  some  other  branch  is  the  valid  one.   In  general,  the  only  guar- 
antee that  there  is  an  error  is  the  event  that  the  parse  backs  up  to  the 
program  goal  symbol  with  a  false  value.   The  approach  toward  error  recovery  '• 
taken  by  many  recursive  descent  compiler  building  systems  is  to  require  the 
language  designer  to  catch  all  the  errors  through  special  error  productions 
and  semantic  routines.   He  must  define  not  only  the  language  but  also  the 
complement  of  the  language  with  the  latter  denoted  as  error  constructs.  He 
must  also  define  appropriate  semantic  routines  to  tell  the  programmer  of 
the  error. 

If  the  problem  of  identifying  the  error  can  be  solved  sufficiently 
well,  then  the  previously  described  error  recovery  mechanisms  can  be  used  tc 
recover  from  the  errors.   Uniquely  occurring  terminal  symbols  can  be  in- 
serted in  the  same  manner  as  was  described  earlier.   Other  errors  can  be 
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recovered  from  by  using  the  parsing  tables  to  generate  strings  -which  are 
compared  with  the  given  string  for  a  possible  change  which  will  fix  the 
error;  if  this  fails,  symbols  can  be  thrown  away  until  one  is  found  which 
can  follow  any  of  the  constructs  currently  being  sought. 

This  chapter  describes  a  method  of  putting  error  detection  into  a 
recursive  descent  parsing  algorithm.   This  error  detection  mechanism  re- 
quires a  minimal  initial  analysis  of  the  syntax  of  a  language. 

7-2  Initial  Analysis 

In  the  initial  analysis  of  the  syntax,  the  syntax  table  is  marked 
with  three  types  of  information:   (.1)  some  symbols  are  marked  as  required; 
(2)  some  terminal  symbols  are  marked  as  unique;  and  (3)  a  table  of  follow- 
ing symbols  is  constructed. 

To  mark  the  appropriate  symbols  as  required,  all  the  definitions 
for  a  particular  nonterminal  are  compared  from  left  to  right.   When  the 
string  to  the  left  of  the  symbol  being  examined  is  different  from  the  initial 
strings  of  all  other  definitions  for  this  nonterminal,  then  that  symbol  and  a 
all  those  following  it  in  the  definition  are  marked  as  required  symbols. 

The  second  function  which  must  be  performed  is  the  identification 
of  all  those  terminal  symbols  which  occur  in  only  one  place  in  the  syntax. 
Their  occurrence  is  marked  with  a  special  flag. 

Finally,  an  entry  in  a  table  of  following  symbols  must  be  made. 
This  table  will  have  an  entry  for  each  symbol,  terminal  or  nonterminal, 
which  is  marked  as  required  and  an  entry  for  each  non-required  nonterminal 
symbol  which  has  a  unique  terminal  symbol  in  one  of  its  definitions.   Each 
entry  will  include  all  the  terminal  symbols  which  can  immediately  follow 
that  entry. 
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7-3  Parsing  and  Error  Detection 

In  parsing,  the  special  markings  denoting  required  symbols  and 
unique  terminal  symbols  are  used  to  mark  each  search  as  required  or  not.  A 
required  search  which  concludes  with  the  value  false  constitutes  an  error. 
Recovery  takes  one  of  two  approaches  depending  on  whether  a  terminal  or 
nonterminal  was  sought. 

When  the  search  for  a  symbol  is  initiated,  the  search  is  marked 
as  required  or  not  according  to  the  logical  AND  of  the  current  search  and 
the  required  flag  for  that  symbol.   The  entry  pointer  to  the  table  of 
following  symbols  is  stacked.   Then  an  error  is  detected  when  the  symbol  is 
not  found  and  the  search  for  that  symbol  is  marked  as  required;  otherwise 
simply  the  wrong  branch  has  been  attempted. 

When  a  terminal  symbol  is  found  which  has  the  unique  terminal  flag^ 
the  current  search  is  marked  as  required  regardless  of  whether  or  not  it.  was: 
so  marked  before. 

When  an  error  is  detected,  the  following  steps  are  taken.   If  the  . 
symbol  being  sought  is  a  terminal  symbol,  then  exactly  the  same  steps  are 
taken  as  described  for  the  case  of  unique  terminal  symbols  in  the  modified 
Floyd  production  parsing  algorithm.   These  are  the  steps  implemented  by  the 
PUTINSTACK  procedure.   If  the  symbol  being  sought  is  a  nonterminal,  the 
string  generator  procedure  is  called  first.   If  it  can  find  a  correction  to 
the  string  which  will  correct  the  error,  then  that  change  is  made  and 
parsing  begins  again  at  the  beginning  of  the  search  for  this  nonterminal. 
If  no  correction  can  be  found,  the  set  of  symbols  is  formed  which  is  the  OR 
of  all  the  following  symbol  sets  indicated  in  the  stack  of  following  symbol 
table  entries.   Symbols  are  then  discarded  until  one  is  found  which  is  a 
member  of  this  set.   The  nonterminal  symbol  is  then  assumed  to  have  been 
found  and  parsing  continues. 
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7 .h     Rationale  for  Required  Searches 

This  section  gives  the  reasoning  "behind  the  setting  of  the  flags 
discussed  above  and  their  use  in  parsing. 

If  a  particular  nonterminal  symbol  is  required,  one  of  the  de- 
finitions for  that  nonterminal  is  required.   Once  enough  symbols  have  been 
seen  to  identify  -which  definition  is  valid,  the  rest  of  the  symbols  of  that 
definition  are  required.   This  is  the  motivation  for  the  setting  of  the 
"required  symbol"  flags.   If  this  nonterminal  is  not  required,  then  neither 
are  any  of  the  definitions  required  nor  any  of  the  symbols  in  the  definitions. 
Hence  at  parsing  time,  the  requirement  of  finding  a  symbol  is  the  logical  MFD 
of  the  current  search  and  the  requirement  for  this  symbol. 

If  a  particular  terminal  symbol  occurs  only  in  one  place  in  the 
syntax,  then  the  recognition  of  that  symbol  confirms  that  the  partial  parse 
ending  with  that  symbol  is  correct  since  there  is  no  other  branch  on  which 
this  symbol  can  occur.   Hence  the  recognition  of  this  symbol  means  that  the 
current  search  can  be  set  as  required.   With  a  more  detailed  analysis  of  the 
syntax,  this  could  be  done  for  two  symbol  pairs  or  even  triples.   That  is, 
all  unique  two  symbol  pairs  could  be  identified  and,  whenever  the  second 
symbol  of  the  pair  were  recognized,  the  current  search  could  be  marked  as 
required. 

However,  only  the  current  search  can  be  marked  as  required  by  the 
recognition  of  a  unique  symbol.  To  illustrate  this,  consider  the  syntax  of 
Figure  18. 
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<A>  :  :  =  <B>  c   <D>  e 
<B>  :  :  =  <F>  g 
<D>  :  :  =  <F>  h 
<G>  :  : =     f 

A  BNF  Grammar  Illustrating  Unique  Terminal  Symbol  Flag  Use 

Figure  18 

The  symbol  "f"  is  a  unique  symbol  but  it  can  only  cause  the  search  for  <F> 
to  be  required  since  the  right  parse  may  include  either  <D>  or  <B>.   One 
could  expand  the  analysis  of  the  syntax  in  order  to  determine  those  cases  in 
which  the  recognition  of  a  unique  symbol  could  mark  more  than  just  the 
current  search  as  required.   However,  at  some  point,  the  analysis  might 
become  complex  enough  that  one  should  just  make  the  conversion  to  modified  ; 
Floyd  productions,  or  some  other  deterministic  parsing  algorithm,  which  has  \ 
much  more  accurate  error  detection  anyway. 

7>5  Conclusion 

The  error  recovery  system  discussed  in  previous  chapters  can  be 
applied  to  recursive  descent  parsing  by  means  of  the  special  techniques 
outlined  in  this  chapter.   These  techniques  include  marking  symbols  as  re- 
quired, marking  terminal  symbols  as  uniquely  occurring,  and  building  a  table 
of  following  symbols.   Each  search  is  marked  as  required  or  not  according  to 
these  markers.   A  required  search  which  fails  constitutes  the  detection  of 
an  error.   The  recovery  can  then  follow  essentially  the  same  methods  as 
discussed  in  previous  chapters. 
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CHAPTER  8.    DEMALGOL  RECURSIVE  DESCENT  COMPILER 

8.1  The  Recursive  Descent  Compiler  Building  System 

A  compiler  employing  most  of  the  system  discussed  in  Chapter  7 
was  implemented  for  the  DEMALGOL  language.   The  compiler  building  system 
that  was  used  is  one  designed  and  "built  "by  Robert  Trout  (90)  for  the  ILLIAC 
IV  Project  at  the  University  of  Illinois.   This  compiler  building  system 
uses  an  extended  BWE.   It  outputs  ALGOL  procedures  for  the  recognition  of 
each  nonterminal.   These  procedures  are  merged  with  the  global  scanner  and 
parser  procedures  and  the  semantic  procedures.   Since  Trout's  compiler 
building  system  outputs  ALGOL  code  for  the  syntax  and  not  tables,  there 
were  no  tables  available  for  use  in  string  generation.   Hence  this  DEMALGOL 
compiler  did  not  implement  that  feature  of  the  error  recovery  system. 

8.2  Construction  of  the  DEMALGOL  Compiler 

The  description  of  DEMALGOL  was  given  to  the  compiler  building 
system  and  the  ALGOL  source  code  that  was  produced  was  modified  by  hand  in 
accordance  with  the  methods  outlined  in  Chapter  7 .    The  syntax  was  analyzed 
to  determine  the  settings  of  the  various  flags  and  the  entries  in  the  table 
of  following  symbols.   Also  a  table  of  nonterminal  names  was  prepared  for 
use  in  printing  names  for  the  error  messages.   The  message  printing  and 
recovery  procedures  and  the  filling  of  the  additional  tables  were  added  to 
the  compiler  source  code.   The  parsing  procedures  were  modified  to  contain 
the  flag  setting  and  testing,  stacking  and  unstacking  of  the  following 
symbol  table  pointers,  and  the  calling  of  the  recovery  procedures.  Another 
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change  that  was  made  was  in  the  initial  calling  sequence.   If  the  call  of 
program  returned  with  the  value  false  or  returned  with  the  value  true  hut 
the  parse  was  not  at  the  end  of  the  program,  then  the  next  symbol  was  dis- 
carded and  the  parse  was  restarted  at  that  point. 

8.3  Error  Recovery  Results 

The  results  of  the  error  recovery  of  this  compiler  were  not  as 
good  as  those  of  the  modified  Floyd  production  compiler.   This  is  primarily 
the  result  of  not  having  the  string  generator  procedure  available  since 
there  was  no  parsing  table  to  use  to  generate  strings.   Nevertheless,  the 
effectiveness  of  the  recovery  was  still  better  than  the  ALGOL  compilers 
with  which  it  was  compared,  as  will  be  seen  in  Section  8.5. 

This  version  of  DEMALGOL  was  not  quite  the  same  as  the  previous 
one.   This  version  allowed  assignment  as  a  primary  but  did  not  allow  com- 
ments.  For  the  following  results,  the  comments  were  removed  from  the  pro- 
grams.  This  compiler  depended  more  on  the  semantic  tests  than  the  previous 
one  and  this  fact  affected  the  error  recovery  to  some  extent.   There  was  a 
certain  amount  of  backtracking  before  the  compiler  was  able  to  identify  the 
error  and  this  meant  that  in  some  cases  the  error  was  detected  at  a  higher 
level,  making  the  recovery  less  effective.   In  seven  cases,  the  compiler  had 
to  restart  the  parse  because,  by  the  time  the  error  was  detected,  the  paths 
available  were  only  those  which  conclude  the  parse.   In  each  case  this 
amounted  to  inserting  END.   BEGIN  although  it  took  several  messages  to  say 
this  (one  for  each  symbol  and  one  for  the  restart  message). 
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8.k     Summary  of  the  Error  Recovery  in  Seven  Programs 

The  following  summary  lists  the  errors  which  were  found  in  the 
same  seven  programs  as  were  used  in  the  discussion  of  the  results  of  the 
modified  Floyd  production  DEMA1G0L  compiler. 
;  missing  (l8  times)  ...  ;  inserted 

.  ELSE  inserted 
.  inserted 
.  THEN  inserted 
BEGIN  inserted 


ELSE  missing  (8  times) 
.  missing  (6  times)  . 
THEN  missing  (3  times) 
Initial  BEGIN  missing 
final  END  missing  . . .  END  inserted 

TRY  ;  ...  TRY  was  meant  to  be  a  label  ...  :  inserted 
GO  TO  6  ...  changed  6  to  identifier 
GO  TO  7E  ...  deleted  7 
END  ;  ...  interchanged  and  then  inserted  • 

one  extraneous  error) 
END;  . . .  this  END  matched  the  initial  BEGIN  and  the 
previous  statement  was  missing  a  semicolon  . . . 
interchanged,  inserted  a  period,  and  restarted 
the  parse 

declaration  types  not  valid  but  allowed 
syntactically  by  <TYPE>  :  :  :  =  <identifier> 
FILE_J   ^semantic  routine  printed  invalid  type  message 
(E,E,E,  and  11  extra  errors  because  of  the 
(2,10)  part  of  the  FILE  declaration) 


(E) 
(E) 
(E) 
(E) 
(E) 
(E) 
(E) 
(E) 
(E) 
(E  and 


(E,E) 
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ELSE  ;  missing  ...  inserted  ELSE  (G) 

ENDA  ;  •  ••  ;  missing  from  previous  statement,  ENDA 

intended  to  be  a  label  ...  deleted  ENDA  (G) 

I  *-  I  kj;    ...  J  deleted  (G) 

If  L  :  =  0  THEN  . . .  assignment  allowed  but  a  relation 

is  needed  . . .  >  inserted  and  then  skipped  to 

THEN  (0  symbols)  seeking  an  <ARITHMETIC  PRIMARY>  (G,G) 
HOME  :  END  ...  <STATEMENT>  ;  missing  ...  skipped  to 

END  (0  symbols)  seeking  a  <STATEMENT>  and  inserted  •  (G) 
JAN  1  . . .  skipped  2  symbols  seeking  <ARITHMETIC  PRLMARY>  (G) 
M  ^  THAN  55  ...  skipped  2  symbols  (THAN  55)  seeking 

<ARITHMETIC  PRIMARY>  (G) 

I  +  A  ...  the  +  was  meant  to  be  an  <-   . . .  at  +  skipped 

one  symbol  seeking  <STATEMENT>  (G) 

A+  1  ...  at  +  skipped  one  symbol  seeking  <STATEMENT>         (G) 

On  each  of  these,  the  missing  semicolon  was  correctly 

inserted  and  this  was  counted  above  in  the  first 

error  reported.   However,  after  inserting  the  semicolon 

on  the  first  one,  the  parse  terminated  and  was  restarted 

creating  k   extra  errors. 
=  :  ...  skipped  two  symbols  seeking  <STATEMENT>  (G) 

R  :=  TRUE;  ...  TRUE  and  FALSE  are  not  primaries  in 

this  version  of  DEMALGOL,  hence  they  look  like  identifiers 
...  at  TRUE  skipped  to  ;  seeking  <ARITHMETIC  PRIMARY>     (F) 
THE  DAY  . . .  inserted  END  .  and  restarted  the  parse 

(F  and  5  extra  errors) 
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BIN  =  ...  :  of  assignment  operator  missing  ... 

error  at  BIN  while  seeking  <STATEMENT>,  skipped 

to  semicolon  (p) 

INTO  <-  LO  ...  LO  undeclared  identifier,  should  have 

been  60  . . .  error  at  INTO  while  seeking  <STATEMENT> 

skipped  to  following  symbol  (P) 

PLUSONE  <-  TRUE  . . .  error  at  PLUSONE  while  seeking 

<STATEMENT>,  skipped  to  following  symbol  (p) 

I  : =  TRUE  . . .  error  at  I  while  seeking  <BL0CK>,  inserted 

END  and  .  and  restarted  parse,  inserting 

BEGIN  (P  and  k   extra  errors) 

ENDOFJOB  :  END  . . .  label  not  declared  . .  inserted 

END  .  and  restarted  parse  (P  and  5  extra  errors) 

8.5  Comparison  of  These  Results  With  Others 

Table  k   compares  these  results  with  those  of  the  modified  Floyd 
production  DEMALGOL  compiler,  the  ALCOR  compiler,  and  the  Burroughs  ALGOL 
compiler  for  the  same  programs.   Even  though  the  performance  of  the  compiler 
with  error  detection  inherent  in  the  parsing  algorithm  and  a  string  gener- 
ator procedure  as  part  of  the  recovery  mechanism  was  much  better  than  the 
others,  the  results  of  the  DEMALGOL  compiler  built  on  a  recursive  descent 
parsing  algorithm  are  still  enough  better  than  those  of  the  ALGOL  compilers 
to  demonstrate  that  automatic  error  recovery  is  not  only  possible  in  recur- 
sive descent  parsing  but  that  it  can  be  better  than  hand  written  error  re- 
covery.  Its  biggest  drawback  appears  to  be  that  it  generates  many  extra- 
neous errors.   All  but  one  of  these  were  the  result  of  situations  in  which 
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the  parse  terminated  and  had  to  be  reinitiated.   Terminating  and  reini- 
tiating the  parse  involved  inserting  a  number  of  symbols  in  order  to  cause 
the  given  program  to  look  like  two  programs  placed  end  to  end.   Since  the 
extraneous  errors  all  came  in  groups  of  three  to  five  (except  the  1.1  from 
the  FILE  declaration),  it  does  not  seem  that  they  would  seriously  inter- 
fere with  the  programmer's  study  of  the  other  messages  to  find  his  legi- 
timate errors.   If  these  extraneous  errors  seem  to  be  a  problem,  one  solu- 
tion might  be  to  suppress  all  but  the  first  error  message  occurring  at  a 
particular  card  location. 

These  results  demonstrate  that  effective  automatic  error  re- 
covery in  recursive  descent  parsing  is  definitely  possible. 


DEMALGOL    DEMALGOL     Illinois      Burroughs 
FPL         Recursive    ALCOR         ALGOL 
Compiler    Compiler     Compiler      Compiler 
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Table  h.      Comparison  of  All  DEMALGOL  Compilers 
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CHAPTER  9.   LANGUAGE  DESIGN  CONSIDERATIONS  FOR 

IMPROVED  RECURSIVE  DESCENT  ERROR  RECOVERY 


9.1  Introduction 

The  earlier  remarks  on  language  design  also  apply  to  the  recursive 
descent  case  with  the  exception  that  the  order  of  the  productions  makes  a 
difference  in  recursive  descent  parsing.   The  order  cannot  he  changed  to  be 
optimal  for  the  error  recovery  unless  the  order  that  is  best  for  error  re- 
covery is  also  the  order  required  for  parsing.   This  chapter  discusses  some 
additional  optimizations  of  the  error  recovery  for  recursive  descent  com- 
pilers. 

9-2  Forcing  Required  Symbols  in  Positions  of  Common  Errors 

By  arranging  the  syntax  appropriately,  symbols  which  are  commonly 
in  error  can  be  made  to  occur  in  positions  of  required  symbols.   In  the 
following  paragraph,  this  is  illustrated  by  the  commonly  left  out  semicolon. 

In  Chapter  h,      it  was  indicated  that  the  syntax  of  DEMALGOL  was 
different  from  ALGOL  in  that  a  semicolon  was  required  to  follow  a  statement 
instead  of  separating  two  statements  for  the  improvement  of  the  recursive 
descent  parser.   If  a  semicolon  separates  statements,  then  finding  a  semi- 
colon indicates  to  the  compiler  that  another  statement  is  expected.   If  the 
semicolon  terminates  a  statement,  then  finding  a  statement  indicates  to  the 
compiler  that  a  semicolon  is  expected.   In  the  first  case,  the  compiler  will 
terminate  the  search  for  a  statement  list  when  a  semicolon  is  missing  but, 
if  the  beginning  of  a  statement  is  in  error,  it  will  be  fixed.   In  the 
second  case,  a  missing  semicolon  will  be  inserted  but  an  incorrect  statement 
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beginning  will  cause  the  compiler  to  conclude  its  search  for  a  statement  list. 
Since  a  missing  semicolon  was  a  more  common  error  than  an  incorrect  state- 
ment beginning,  the  second  form  of  the  syntax  improved  the  error  recovery. 
This  illustrated  the  principle  of  arranging  the  syntax  so  as  to  cause  the 
more  common  errors  to  come  at  positions  in  the  syntax  which  will  be  required 
symbols. 

9- 3  Syntax  Description  for  Longer  Required  Searches 

Since  a  required  search  is  not  only  a  function  of  the  required 
symbols  but  also  depends  on  the  previous  recursion  level  being  required 
also,  it  is  necessary  to  arrange  the  syntax  in  such  a  way  as  to  have  re- 
quired searches  reach  as  low  a  level  as  possible.  '■ 

Figures  19  and  20  illustrate  two  different  ways  of  defining  a 
<BL0CK>.   In  each  case,  the  symbols  underlined  represent  symbols  which  are 
marked  as  required.   If  we  assume  that  <BL0CK>  is  called  from  a  non-required  j 
search  and  that  BEGIN  occurs  only  in  those  places  indicated  in  Figures  19 
and  20,  then,  in  the  example  in  Figure  19,  no  subsequent  search  will  be 
marked  as  required  unless  the  preprocessor  recognizes  that  all  the  BEGINs 
are  in  the  same  set  of  definitions  in  which  case  the  ";  "  following  <C0MMENT> 
in  <BEGIN>  will  be  required.  Whereas,  in  the  example  in  Figure  20,  BEGIN 
will  set  <BL0CK>  as  required  which  will  in  turn  cause  DECLARATION  LIST>, 
<TYPE  LIST>,  "• "  following  <IYPE  LIST>,  "■ "  following  <C0MMENT>, 
<STATEMENT  LIST>,  ";  "  following  <STATEMENT>,  and  END  to  be  required 
searches.   Obviously,  the  required  searches  will  be  carried  to  a  much  lower 
level  in  the  second  case  and  hence  the  error  recovery  will  be  much  more 
precise  than  in  the  first  case. 
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<BL0CK  :  :  =  <BL0CK  HEAD>  <C0MP0UND  TAIL> 

<BL0CK  HEAD>  :  :  =  <BEGIN>  |  <BL0CK  HEAD>  <TYPE>  <IYPE  LIST>  j_ 
<C0MP0UND  TAII>  :  :  =  <STATEMENT>  j_  <C0MP0UND  TAII>  |  END 
<BEGIN>  :  :  =  BEGIN  <C0MMENT>  j_  |  BEGIN 

Illustration  of  Poor  Syntax  for  Long  Required  Searches 

Figure  19 


<BL0CK>  :  :  =  BEGIN  <C0MMENT  OR  N0T>  DECLARATION  LIST> 

<STATEMENT  LIST>  END 
<C0MMENT  OR  N0T>  :  :  =  <C0MMENT>  _j_  |  \ 
DECLARATION  LIST>  :  :  =  DECLARATION  LIST>  <TYPE>  <TYPE  LIST> 

I   I  X 

STATEMENT  LIST>  :  :  =  <STATEMENT  LIST>  <STATEMENT>  •_  |  \ 

Illustration  of  Good  Syntax  for  Long  Required  Searches 

Figure  20 

The  main  difference  between  these  two  examples  is  that  in  the 
;  second  one  the  uniquely  occurring  symbol  BEGIN  was  made  part  of  a  much 
?  larger  definition,  namely  <BL0CK>,  so  that  it  caused  all  the  subsequent 
I  searches  arising  out  of  the  search  for  a  block  to  be  required  searches.   In 
the  first  example,  the  BEGIN  caused  only  the  search  for  <BEGIN>  to  be  re- 
quired and  so  it  had  little  effect  on  the  rest  of  the  search  for  <BL0CK>. 

Arranging  the  syntax  so  that  more  symbols  are  marked  as  required 
and  so  that  unique  terminal  symbols  cause  as  global  constructs  as  possible 
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to  be  marked  as  required  can  make  an  important  difference  in  the  effective- 
ness of  the  error  recovery.   This  section  has  shown  how  the  definition  of 
an  ALGOL  block  can  be  chosen  in  such  a  way  that  the  error  recovery  is  more 
effective. 

$.k     Semantic  Setting  of  the  Current  Search  as  Required 

Another  addition  to  the  system  to  improve  the  control  over  the 
error  recovery  would  be  the  ability  through  either  semantic  routines  or 
markers  in  the  syntax  to  set  the  required  search  flags  as  the  unique  symbols 
do.   It  may  be  true  that  if  the  parser  gets  to  a  certain  point  in  the  syntax 
then  the  parse  to  that  point  is  correct,  although  the  analysis  necessary  to 
determine  that  automatically  is  beyond  the  scope  of  the  preprocessing  pro- 
gram.  In  this  case,  the  special  semantic  marker  could  be  inserted  in  the 
syntax  by  the  language  designer  to  cause  the  same  effect  as  the  unique 
symbol  markers  and  consequently  improve  the  error  recovery  by  keeping  the 
required  searches  extended  down  to  lower  levels. 

9«5  Uniqueness  of  Different  Alternatives 

In  addition  to  having  required  searches  extend  down  to  as  low  a 
level  as  possible,  it  is  also  helpful  to  have  alternative  definitions  differ 
as  close  to  the  beginning  as  possible  so  that  more  of  the  symbols  of  the 
definitions  will  be  required  symbols.   This  will  cause  more  searches  to  be 
required  and  hence  cause  more  errors  to  be  detected  at  lower  levels  so  that 
the  recovery  can  be  more  precise. 
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9.6  Conclusion 

This  chapter  has  discussed  four  additional  suggestions  beyond 
those  of  Chapter  6  which  can  improve  the  effectiveness  of  error  recovery 
in  the  automatic  error  recovery  scheme  outlined  in  Chapter  "J.  These  have 

included  (l)  putting  symbols  commonly  subject  to  error  in  positions  of  re- 
quired symbols,  (2)  arranging  definitions  so  that  required  searches  are 
carried  to  as  low  a  level  as  possible,  (3)  setting  the  required  search  flag 
from  semantic  routines,  and  {h)   making  alternative  definitions  differ  as 
close  to  the  beginning  as  possible. 
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CHAPTER  10.    FURTHER  RESEARCH 

10 . 1  Introduction 

This  chapter  presents  several  paths  that  future  research  in  this 
area  could  take.   These  include  more  work  on  the  methods  described  in  this 
thesis,  extension  of  these  methods  to  other  parsing  algorithms,  special 
extensions  for  probabilistic  grammars,  applications  in  computer  science 
education,  and  applications  in  extensible  compilers.   Each  of  these  is 
discussed  in  a  separate  section. 

10.2  Refinements  to  the  Current  System 

Three  additional  avenues  of  research  apply  directly  to  the  systems 
of  automatic  error  recovery  already  discussed.   One  of  these  is  the  area  of 
optimizing  the  speed  of  the  string  generator  procedure.   As  was  indicated 
in  Chapter  V,  the  original  three- symbol  string  generator  took  so  long  that 
it  caused  OSL  programs  with  many  errors  to  take  about  five  to  ten  times  as 
long  as  error-free  ones,  although  the  other  compilers  were  not  so  adversely 
affected.   This  led  to  shortening  the  string  length  to  two  symbols.   Also 
some  additional  tests  were  inserted  to  prevent  the  string  generator  from 
following  two  or  more  identical  paths.   A  version  of  OSL  with  these  changes 
was  never  implemented  so  there  has  been  no  test  of  the  actual  improvement. 
There  are  other  conditions  which  the  string  generator  might  be  programmed  to 
recognize  which  would  further  reduce  the  time  needed  for  error  recovery. 

The  second  avenue  of  research  which  could  be  followed  is  an  inves- 
tigation of  the  possibility  of  having  the  order  of  the  Floyd  productions 
automatically  optimized  on  the  basis  of  a  record  of  the  number  of  times  each 
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production  was  actually  applied.   This  would  probably  have  a  greater  effect 
on  the  speed  of  the  parser  than  on  the  effectiveness  of  the  error  recovery. 

The  third  area  is  an  implementation  and  development  of  the  re- 
cursive descent  error  recovery  system  since  for  this  research  only  a  hand 
modified  compiler  was  used.  With  a  compiler  building  system  implemented 
using  this  error  recovery  system,  it  would  be  possible  to  refine  the  tech- 
nique further  and  evaluate  its  effectiveness  over  existing  handcoded  error 
detection  and  recovery  systems. 

10.3  Extension  to  Other  Parsing  Algorithms 

Another  area  of  research  is  that  of  the  extension  of  the  present 
system  to  other  parsing  algorithms  such  as  those  based  on  precedence  tech- 
niques.  This  research  has  taken  two  different  parsing  algorithms  and  shown 
that  the  form  of  automatic  error  detection  and  recovery  described  can  be 
applied  in  both.   It  is  conceivable  that  it  could  be  extended  to  other 
parsing  algorithms  as  well  but  this  would  need  to  be  investigated  in  order 
to  determine  the  extent  to  which  it  can  be  used  and  what  modifications, 
extensions,  or  limitations  are  needed  to  make  it  work. 

10.^  Extension  to  Probabilistic  Grammars 

The  work  of  Clarence  Ellis  (33  )  on  probabilistic  grammars  suggests 
a  further  question.   That  is,  to  what  extent  could  the  error  recovery  be 
based  on  the  probabilities  of  the  generated  strings  in  combination  with  the 
degree  to  which  the  generated  strings  match  the  given  string  instead  of  being 
based  just  on  the  degree  to  which  the  strings  match.   It  would  be  interesting 
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to  see  if  such  additional  information  would  improve  the  recovery  in 
situations  where  it  is  currently  only  fair  without  minimizing  the  effect 
on  those  which  are  currently  excellent. 

10.5  Applications  in  Computer  Science  Education 

Compilers  which  describe  syntactic  errors  in  ways  that  are  clear 
and  precise  could  be  a  big  help  to  beginning  students.   Some  studies  into 
the  use  of  the  automatic  error  recovery  system  described  in  this  thesis  for 
developing  compilers  for  student  use  could  be  profitable. 

Beginning  students  in  computer  science  seem  to  have  three  main 
areas  of  frustration  in  learning  to  program.   The  first  is  the  discipline 
of  atomizing  the  logic  of  a  problem  to  the  level  needed  to  program  it.   It 
seems  to  be  difficult  to  learn  how  to  describe  explicitly  the  procedure  used  ; 
to  solve  a  problem.   The  second  area  is  that  of  getting  the  algorithm  repres-j 
ented  in  a  programming  language.   Misplaced  commas  and  misspelled  words  are  | 
less  crucial  on  English  themes  than  they  are  in  programming  and  it  is  some- 
times difficult  to  become  accustomed  to  this  preciseness.   The  third  area 
is  that  of  identifying  and  correcting  the  logical  bugs  in  the  program  and 
in  the  algorithm.   Certainly  this  area  is  not  limited  to  beginning  pro- 
grammers. 

Compilers  built  for  computer  science  education,  particularly  be- 
ginning programming  courses,  using  the  error  detection,  recovery,  and  dia- 
gnostic features  outlined  in  this  thesis  could  help  eliminate  the  frustra- 
tion experienced  by  beginning  programmers  in  writing  correct  statements  in 
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the  programming  language.   Some  research  designed  to  determine  whether  such 
compilers  would  significantly  reduce  the  effort  needed  in  learning  pro- 
gramming would  be  helpful.   It  might  also  he  interesting  to  determine 
whether  such  compilers  help  experienced  programmers  accomplish  their  work 
significantly  more  quickly  and  easily. 

10.6  Applications  to  Extensible  Compilers 

Perhaps  the  most  important  area  of  application  and  further  re- 
search is  that  of  extensible  compilers  since  some  form  of  automatic  error 
recovery  is  necessary  for  an  extensible  compiler  to  operate  effectively. 

Extensible  compilers  are  compiler  systems  in  which  a  basic  or  core 
language  is  given  to  which  other  language  constructs  can  be  added.   Theoret- 
ically, the  core  contains  all  the  primitive  syntactic  constructs  and  seman- 
tic operators  that  are  needed  for  any  addition  or  extension  to  the  language. 
An  area  of  computer  application  which  cannot  be  handled  easily  by  any  exist- 
ing languages  is  a  candidate  for  an  extensible  compiler.   The  additional 
syntactic  constructs  which  are  needed  for  that  application  are  easily  speci- 
fied in  terms  of  the  existing  syntactic  constructs  of  the  extensible  lan- 
guage, and  the  semantics  associated  with  the  syntactic  additions  can  easily 
be  written  in  terms  of  operators  given  in  the  core  language.   The  extensible 
compiler  system  merges  these  additions  into  itself  giving  a  compiler  for  the 
new  "extended"  language  in  which  problems  from  the  given  area  of  application 
can  then  be  programmed. 

To  a  number  of  people  extensible  compilers  appear  to  be  the  best 
solution  to  the  task  of  providing  programming  languages  for  the  increasing 
variety  of  problem  areas  to  which  computers  are  being  applied.   The  approach 
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of  producing  separate  languages  and  compilers  for  each  problem  area  is  con- 
sidered too  expensive  and  limiting,  and  that  of  producing  one  universal 
programming  language  is  considered  both  too  large  and  too  rigid  to  allow 
for  future  changes.   These  concepts  were  presented  and  discussed  at  a  sym- 
posium of  the  special  interest  group  on  programming  languages  of  the  A.  C. 
M. ,    and  the  papers  were  printed  in  volume  k,   number  8,  August  1969*  of  the 
SIGPLAN  Notices.   As  well  as  discussing  the  concept  of  extensible  compilers 
this  symposium  also  presented  seven  extensible  languages  which  were  either 
developing  or  operational. 

The  increasing  effort  currently  being  spent  on  extensible  compil- 
ers and  extensible  languages  plus  the  fact  that  the  extensible  compiler  must 
be  based  on  some  compiler  building  system  in  order  to  be  able  to  build  the 
extension  into  the  compiler  make  this  research  on  automatic  error  recovery 
of  greater  importance.   If  the  extensible  compiler  is  to  function  effective- 
ly, it  must  be  able  to  handle  errors  in  the  extensions  effectively.   Also 
the  extensions  will  cause  constructs  which  formerly  were  errors  to  be  valid, 
hence  the  error  detection  in  the  basic  part  of  the  compiler  must  itself  be 
dynamic  in  order  to  allow  the  extensions  to  be  valid.   Therefore  the  only 
error  detection  and  recovery  mechanism  which  is  suitable  for  extensible  com- 
pilers is  an  automatic  one.   Some  research  which  would  apply  the  error  re- 
covery system  outlined  in  this  thesis  to  an  extensible  compiler  system  would 
be  very  much  in  order.   This  research  might  parallel  that  described  in  sec- 
tion 10.3,  namely  that  of  determining  the  extent  to  which  this  error  detec- 
tion and  recovery  is  applicable  to  other  parsing  algorithms. 
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10 . 7   Summary 

This  chapter  has  indicated  five  directions  that  could  be  taken  in 
future  research  with  the  system  of  automatic  error  recovery  described  in 
this  thesis.   One  of  these,  refinements  on  the  current  system,  includes 
timing  evaluations  and  speed  improvements,  automatic  optimization  based 
on  run  time  records,  and  implementation  of  the  recursive  descent  automatic 
error  recovery  system.  Perhaps  the  single  most  important  application  of 
this  system  might  be  in  the  implementation  of  extensible  compilers.   Also 
of  importance  might  be  use  of  a  compiler  building  system  with  this  auto- 
matic error  recovery  system  to  build  compilers  for  computer  science  educa- 
tion or  other  student  use  situations.   Also  suggested  were  the  possibilities 
of  extending  the  methods  to  other  parsing  algorithms  and  to  probabilistic 

;.  grammars.   The  pursuit  of  any  of  these  areas  of  further  study  could  be  re- 
warding and  could  provide  valuable  contributions  to  the  field  of  compiler 

,  construction. 
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CHAPTER  11.   SUMMARY 

This  thesis  has  set  forth  a  method  of  automatic  syntactic  error 
recovery,  a  discussion  of  its  implementation,  a  presentation  of  example 
results,  a  comparison  of  these  results  with  the  results  of  the  error  re- 
covery of  other  compilers,  suggestions  for  improvement  of  error  recovery 
in  specific  languages,  and  suggestions  for  further  research. 

The  first  chapter  summarized  the  results  of  other  research  and 
gave  the  philosophy  of  this  research.   Although  there  has  been  much  work  in 
some  related  areas,  there  has  been  very  little  work  on  the  subject  of  auto- 
matic error  recovery.   The  basis  of  the  philosophy  of  this  work  is  the 
conviction  that  the  compiler  should  attempt  to  find  all  the  programmer's     i 
syntactic  errors  on  the  first  run  and  describe  them  clearly  to  the  programmer, 
In  order  to  accomplish  this,  the  compiler  should  attempt  to  diagnose  the 
error  situation  to  such  an  extent  that  it  can  correct  the  source  program 
because  it  is  only  in  that  context  that  the  compiler  can  give  complete  in- 
formation to  the  programmer  as  to  the  nature  of  his  error.   A  method  of 
evaluating  and  comparing  error  recovery  systems  was  devised.   This  method 
includes  several  variables  which  can  be  measured  as  well  as  a  functional 
relationship  between  these  variables  which  can  reflect  the  extent  to  which 
an  error  recovery  system  approaches  the  goal  of  describing  all  a  programmer's 
syntactic  errors  precisely. 

The  second  chapter  presented  the  translator  writing  system  in 
which  this  work  on  automatic  error  recovery  was  imbedded.   The  TWINKLE 
language  and  compiler  for  syntax  description  were  mentioned  followed  by  a 
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discussion  of  the  conversion  from  the  Backus  Naur  form  created  by  the 
TWINKLE  compiler  to  modified  Floyd  production  language.   This  discussion 
included  the  types  of  Floyd  production  groups  generated  and  the  basis  for  the 
them  in  the  Backus  Naur  form.   The  parser  language  and  the  parsing  tables 
constructed  from  the  modified  Floyd  productions  were  discussed  as  well  as 
the  optimization  of  the  parser  instruction  table.   The  chapter  concluded 
with  a  short  comment  on  the  conversion  of  the  parsing  table  to  Burroughs 
extended  ALGOL  source  code. 

Chapter  3  gave  the  operation  of  the  error  recovery  system  in 
the  translator  writing  system.   The  operation  of  the  parser  instructions 
which  are  specific  for  error  recovery  was  discussed  as  was  the  operation  of 
the  procedures  which  are  called  by  these  parser  instructions.   Three  types 
of  error  situations  were  identified  by  the  type  of  parser  instruction  in 
which  they  were  detected.   The  first  of  these  was  the  case  of  a  specific 
required  terminal  symbol.   The  error  recovery  in  this  case  was  to  insert 
that  symbol  in  one  of  three  ways:  (l)  to  compare  the  symbol  in  error  and 
the  next  one  or  two  with  a  table  of  legal  following  symbols,  looking  for  a 
simple  way  of  inserting  the  missing  symbol  to  make  the  string  legal;  (2)  to 
call  a  string  generating  procedure  which  would  generate  all  two  or  three 
symbol  strings  legal  at  that  point  in  the  parse,  looking  for  one  which  when 
compared  with  the  symbol  in  error  and  the  next  three  symbols  would  indicate 
a  way  to  correct  the  program;  and  (3)  to  insert  the  missing  symbol  in  front 
of  the  symbol  at  which  the  error  was  detected.   The  second  type  of  error 
situation  was  the  case  of  a  group  of  productions  all  of  which  were  to  make 
the  same  reduction.   In  this  case,  the  recovery  either  was  to  do  step  (2) 
above  or  to  make  the  reduction  anyway.   The  third  type  of  error  situation 
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included  all  other  cases.   Here  the  recovery  first  was  to  try  step  (2)  above 
and  if  that  failed,  then  it  was  to  discard  input  symbols  until  it  came  to 
one  which  would  allow  it  to  make  a  reduction. 

The  basic  test  and  development  language  DEMALGOL  was  described  in 
Chapter  k.       The  results  of  the  error  recovery  of  the  DEMALGOL  compiler  on 
seven  small  programs  were  presented.   The  results  of  running  the  same  seven 
programs  on  two  ALGOL  compilers  were  compared  with  the  results  of  the 
DEMALGOL  compiler  and  summarized  in  Table  2. 

Three  additional  languages  were  implemented  on  the  translator 
writing  system  described  in  Chapter  2.   These  three,  TESLA,  ICL,  and  OSL, 
were  discussed  in  Chapter  5  along  with  examples  of  the  error  recovery  of 
these  three  compilers.   In  one  case,  ICL,  there  was  also  another  compiler 
with  which  the  results  could  be  compared.   These  results  along  with  those 
of  DEMALGOL  were  summarized  in  Table  3« 

Chapter  6  offered  several  ways  in  which  error  recovery  could  be 
improved  for  any  compiler  constructed  with  this  translator  writing  system. 

The  system  of  error  recovery  outlined  in  Chapter  3  could  be 
extended  to  work  in  a  recursive  descent  parsing  algorithm.   Chapter  7 
presented  an  algorithm  for  error  detection  appropriate  for  recursive  descent 
parsing  which  would  enable  the  automatic  error  recovery  system  to  be  used. 
This  algorithm  provided  a  means  of  marking  subgoals  as  required,  causing  an 
error  to  be  detected  if  the  subgoal  were  not  found. 

A  DEMALGOL  compiler  built  with  this  recursive  descent  error  de- 
tection system  and  automatic  error  recovery  was  described  in  Chapter  8. 
The  results  of  this  compiler  on  the  same  DEMALGOL  programs  used  earlier  were 
compared  with  the  results  of  the  other  DEMALGOL  and  ALGOL  compilers  and 
summarized  in  Table  k. 
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As  Chapter  h    had  offered  ways  to  improve  the  error  recovery  of 
a  modified  Floyd  production  language  compiler,  so  Chapter  9  offered 
additional  suggestions  for  improving  the  error  recovery  in  recursive  descent 
compilers  built  using  the  system  described  in  Chapter  7. 

Several  further  areas  of  research  involving  the  systems  described 
in  this  thesis  were  discussed  in  Chapter  10,  the  primary  ones  probably  being 
the  applications  to  extensible  compilers  and  the  applications  in  computer 
science  education. 

The  system  of  automatic  error  recovery  described  in  this  thesis 
seems  to  be  a  very  effective  system  by  all  the  comparisons  and  tests  that 
have  been  discussed.   It  has  proven  superior  to  all  other  systems  with  which 
it  has  been  compared  in  its  ability  to  recover  from  errors  in  such  a  way  as 
to  find  all  the  errors  in  most  programs  and  to  describe  most  of  the  errors 
exactly.   Thus  it  has  been  shown  to  be  superior  in  the  effectiveness  of  the 
resulting  error  recovery  by  the  definition  of  effectiveness  given  in  this 
thesis. 
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APPENDIX  B 
ALCOR- ILLINOIS  ERROR  RECOVERY  EXAMPLES 


1,1+7 
'LABEL*  not  valid  ...  "illegal  occurrence  of  char.  'LABEL' 

in  or  after  statement  following  a  "block  begin"  (E  and  2 

extra  errors) 
'GO  TO'  B)(  ...  )  (should  be  .,    ...  "illegal  occurrence  of  char.) 

in  or  after  GO  TO  statement.   It  has  been  dropped  from  the 

source  program. "  (G) 

)(  ...  "illegal  occurrence  of  char.  (  .   Only  a  . ,   or  -END- 

is  expected. "  (E) 

'IF*  P  'BEGIN'  ...  "illegal  occurrence  of  char.  'BEGIN'  in 

or  after  'IF'  (F  and  1  extra  error) 

I  .3=  I  +  I+J  ,}    ...  "a  sequence  of  digits  is  followed  by  the 

letter  J"  (G) 

'GO  TO  D.=  'GO  TO'  ...  'missing  after  TO  and  .=  should  be  . ,  ... 

"illegal  occurrence  of  char.  'GO  TO'  in  or  after 

assignment  statement"  (G) 

'GO  TO'  6  ...  "identifier  6  is  not  declared  or  is  undefined 

at  this  point"  (E) 

'GO  TO'  7E  ...  "a  sequence  of  digits  is  followed  by  the  letter  E"        (E) 
B..  R.='TRUE'  ...  "identifier  /  was  used  twice  with  the  same  wrong 

type  or  kind"  (extra  error) 

I  .=  'TRUE'  ...  "type  of  left  side  of  .=  differs  from  right  side"        (E) 
no  'FINIS'  (7  times)  ...  "no  FINIS  at  the  end  of  program" 

(E  and  1  error  missed) 
'LABEL'  not  a  valid  type  (5  times)  ...  "illegal  occurrence 

of  char.  'LABEL'.   It  should  only  be  in  a  specification  part." 

(E  with  23  extra  errors  and  1  missed  error) 


Ik8 

.,   missing  after  'GO  TO'  MIX  ...  "identifier  MIXENDA  is  not 

declared  or  is  undefined  at  this  point" 

(G  and  1  extra  error 
ENDA. ,    missing  ...  "A  subexpression  in  an  expression  should 

"be  arithmetic"  (p) 

.,  missing  after  SUM.=SUM+2  (7  times)  ...  "a  sequence  of  digits 

is  followed  by  the  letter  S"  (G) 

statements  after  final  'END'  ...  "program  not  a  compound  statement 

or  block"  (F) 

'IF'M'NE'THAN  55  'THEN'  ...  "identifier  THAN55  is  not  declared 

or  is  undefined  at  this  point"  (G  and  5  extra  errors) 

.,   missing  three  cards  before  N.=  ...  "illegal  occurrence  of  char. 

after  ident.,  number  or  array  element  which  is  in  or 

after  a  type  declaration"  .  (f)  , 

'IF'  L.=0  'THEN'  ...  "illegal  occurrence  of  char.  =  in  or 

after  'IF'  "  (G  and  1  extra  error 

.,   missing  before  'COMMENT'  ...  "  'COMMENT'  does  not  follow 

'BEGIN'  or  .,    ."  (E) 

.,   missing  before  'BOOLEAN'  ...  "illegal  occurrence  of  char. 

'BOOLEAN'  after  ident.,  number  or  array  element  which  is  in 

or  after  a  type  declaration"  (G) 

•;  missing  before  'LABLE'  ...  "undefined  delimiter" 

(E  and  1  missed  error) 
YR.,70.,  ...  should  be  YR.=70.,  ...  Identifier  YR  is  not  procedure 

without  parameters"  (G  and  1  extra  error  ) 

'IF'M0'EQ2'THEN'  ...  "undefined  delimiter"  reference  to  a  line  of 

twenty  syntactic  quantities  (F  and  1  extra  error)  : 


lU9 

'THEN'  missing  ...  "illegal  occurrence  of  char.  'BEGIN'  after  ident., 

number  or  array  element  which  is  in  or  after  a  rel.  oper. 

of  an  expression"  (F) 

. ,  missing  ...  "illegal  occurrence  of  char.  'GO  TO'  after  ident., 

number  or  array  element  which  is  in  or  after  a  assignment 

statement"  (F) 

'END'  NMINUSI  .=  ...  . ,   missing  after  'END'  ... 

"The  char.  =  appeared  in  comment  after  'END'.   Perhaps 

a  . ,  is  missing.   Translation  stopped  " 

(E  with  2  extra  errors  and  h   missed  errors) 
I  'NED'  ...  should  be  'END'  ...  "illegal  occurrence  of  char,  undefined 

delimiter  in  or  after  statement  following  a  block  begin. 

It  has  been  dropped  from  the  source  program. " 

(G  with  1  extra  error) 
initial  'BEGIN'  missing  ...  "program  not  a  compound  statement 

or  block"  (F) 

'FILE'  DATA  (2,10)  . ,  ...  "illegal  occurrence  of  char,  undefined 

delimiter  in  or  after  statement  following  a  begin.   It  has 

been  dropped  from  the  source  program. " 

(E  with  7  extra  errors) 
I+A  ...  should  be  I.=A. ,  ...  "illegal  occurrence  of  char.  +  in  or 

after  statement  following  a  block  begin"  (G) 

A+l  ...  should  be  A. =1. ,  ...  "identifier  AA  is  not  declared  or 

is  undefined  at  this  point"  (G) 

.,  missing  after  A+l  ...  "illegal  occurrence  of  char.  'IF'  in  or 

after  +  or  -  of  an  expression"  (G) 


150 
L0  ...  should  be  60  ...  "identifier  LO  is  not  declared  or  is 

undefined  at  this  point"  (E) 

.,  missing  "before  'IF'  ...  "illegal  occurrence  of  char. 

'IF'  in  or  after  +  or  -  of  an  expression  "  (G) 

((YR-Z  'If  ...  spoiled  card  left  in  deck  ...  "illegal 

occurrence  of  char.  'IF'  in  or  after  +  or  -  of  an 

expression"  .  (G) 

)  missing  (2  times)  ...  "missing  )  before  char.  .,    "  (E,E) 

'THEN'  missing  ...  "illegal  occurrence  of  char.  .,    in  or  after 

'IF'  .   The  -THEN-  is  missing."  (E) 

.,   missing  before  'COMMENT'  ...  "  'COMMENT'  does  not  follow 

'BEGIN'  or  .,  "  (E) 
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>     J  t  = 

hMF*'T 
FUT  A 
*SST 
IT  CP 
M  cdF 
cf  |  I 
T  A     pA 

h>     t  J 

G I  r  a  I 
r  TrAt 

PAfK> 
S.  T  p  M' 

put   p 

F  PTFT 
SF  |  I 
FOPAP 

<PFP 
G  T  P  A  I. 


: :  r   # 

* 

>  t|r     * 

MP    pfa 
IPNS>     : 

<CASF 
FlCATlP 

>  :  :  =  t 

< 
tl     <LP 

>  !  t  =  *' 

SSI  r^.'MF 
GN'>.  FMT> 
PUP  PFS 
ClFTCAT 
WITS>    < 

ttfrn   c 


00  <f 
PATTF 

1  1ST 
PF     PF 
r     <Sf 

r  a  t  t  r 

T  <RF 
ClFIf 

P     I  AF 

:=    <* 

PF|  > 

FP>     < 
J 

<0N'FS 

<zfrp 

<■  I  K  p  I ' 
<Dl'TP 

<  0 1 1 T  P 

<C|ITP 
<COMP 
<TPA^ 
<N'PCF 

<  P  A  \<  P 
<STPP 
<STPP 
<STFiF 
<Cl  PP 
F^PTV 
INSFT 
T  N  S  F  T 
TNfl  F 
TNT!  P 
SOK    »- 

;=  <T 
I  IMTT 

K  S>     V 

ALI  * 
SI  ON  A 
wFp  I 
UPIIT 
M>  * 

s  ?  = 
IGK'AT 
IOVS> 
PATA 
FTP  Tr 


ASF  PFPFNPFN'T  PTGIT>  #21   / 

PN  I  TST  WITH  PAPFM>  ANY  PIPTT  #20   / 

WITH  PARFN>  <CA<E  nFPENPFNT  DTPIT>  #21 

CLAPATTPM  nFFlMTTTPh  *****************; 

Pl'FNPF  PfPfTTTTPN  SPFClFTCATlnN>    / 

<STFP  RFPFTITIPV  SPECIFICATIONS        I 

f>,>  :•=  FMPTY  / 

PETITION  SPECIFIER^  #2?   J 

ATIPN>  M  =  FMPTY  / 

Fl>  #23   #»  <PFPFTTTIPN  SPFPIfTER>  #21 

h!>  \ 

#25  #»       / 

CPNTPPL    STATFMFM>    #J  » 


SSTGN^FNT>    / 
ASSTGWMF^T>    / 
ASSTGN'MFWT>    / 

FTFDPACio    / 

TISPI  AY  STATFMFN'T>  / 

COMPARISON  STATFK'ENT>  / 
TPNAL  OUTPUT  PlSPlAY  STATFMFNT>  / 
FF  STATF^FNT>  / 
PF  STATFwFNT>  / 

sTATr^EMT>  / 
F  ASSTGK'MENT>  / 
F  PFSPT>  / 
F  SFT>  / 
SFT>  / 


?6  #30  #28   / 

26  <  TNPl'T    SPFCTFICATT0NS>  1 

27  #3^    #28       / 

27       <TNP!'T    SRFCTFTCATTPMS>  J 

TPFs^    AREMT    TAsF    LIMIT    I.T5TSJ 

I'T    PFSIGmATPp>    #9fi       / 
<U,Pl'T    pFslGf*  ATPP>    #29    / 

<CASF    L  I  ►'  I  T  S  >    <HPUT    PFSTP.NATPP>    #29 
/ 

! IST>    ; 

IT>  #»  <PPPFR  LIMTT>  #]  *32  I 

33   <TmFHT  GROUP  ASSIGfiMFM>       / 

<INPUT  GPPlIP  ASSTPNMFNT>      I 


PP>  <PATA  PATTFrN  <^PFC  I  F  I  C  A  T  lnf>  S  >  #34 

::=  <r>ATA  pATTFRV>  #35   / 
PATTFP  N  >  #^6   / 
TCATTONO  «,     <0ASE  llMlTS> 
<PATA  PATTFRm>  #36   J 


IMFRSIPK  TMnlCATPR>  <rp«TA  DFSTGNATnp>  f\7       / 
U\FR?Tr^  !K'PTpATnP>  <BIT  PATTFRm>  *3P   J 
t:=  *T»P|iT  A3Q   <F«rFDPACK  ASSTPNMFMT>  #10   / 

<nUTF-UT  FFE^PAPIO  #»  <FFFPPACK  A  S  S  I P-  WMFNT>  #40 
F  W T  >  !l  = 

PPl'F  PF«;IpwATPR>  <FfFDPACK  «;PFP  T  F  I  C  AT  I  PM$>  #11   > 
CATTPMS>  Jt=  <PUTPI.IT  RATA  SpEP  T  F  T  C  A  T  T  fiws>  #12   / 
MTTS>  <rUTPl'T  PATA  CPEPIFICATIP^S>  #13   / 
*     SFFCTFICATTPNS>  #»  <PASF  IIHTTS> 

<Pl'TPUT  DATA  SPFCTFICATT0MS>  #13   ; 
T"rTCATTPN'^>  tl  = 
IN'VPRSTPh  INPICATPP>  <PI)TP|'T  PFPUP  PFSTGK'ATPR>  #11 


00005700 
00005fiOO 
00005  900 
00006000 
O000M00 
00006200 
00006300 
00006100 
00006500 
00006600 
00006700 
O00OA80O 
00006900 
00007000 
00007100 
00007200 
00007300 
00007100 
00007500 
00007600 
00007700 
00007800 
00C07900 
00006000 
00008100 
00008200 
00008300 
00008100 
00008500 
00008600 
00008700 
00008800 
O0008900 
00008950 
00009000 
00009100 
00009200 
O0009300 
00009/jOO 
00009500 
00009600 
00009700 
00009>'00 
000O9900 
00010000 
00010100 
0  P  0  1  0  2  0  0 
P0010300 
POO  1 C 100 
00010500 
00010600 
00010700 
OC010800 
0001C900 
0001 1000 
0001 1 100 
0001 1200 
0001 1 300 
0001 1 100 

opoi 1 500 

JOOOl 1 600 
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<i  rrir ai    uvf»*tpn  n.riC4TrR>   » 


<TKP 

<rl'T 
<TM 

<  m  fl  r 
<PAN 
<f  A« 

<Mr 
<?Tr 


<<Tr 

<MP 
<*Tr 


IT    GFf 
FUT    rF 

n-sfff 

pancf 
rpi-   si 
f  UU 

RAC-F    A 
FAC"7    (■ 


PACr  S 
PAGF  F 
FAGF  S 


liF  PF^TGVAT 
n  |  ■  f  p  F  M  P  N  A 
<TATFMTM> 

OATF>-'rhn> 

flTF*>FN'T> 

T    L  TcT>     «  :  = 

c,c  ]  r,n  >tn  T> 

<  S  T  fi  P  rt  r, 
RppP    ASSIGN 

<  r,  r  n  1 1  p 

r T>         : :=    t 

rsFT>  : • =  t 
PFCTFTTATTf 


no    ts  = 

TP^>  i J 
t  tr     #FF 

#FF 
:  «  =     *FF 

#FF 

:  !=   nv 

*FF 

<ta<;f 

<r  asf 

:  t=    *ff 

F  ASSTP, 
f'F^  T  >      ! 

PFSIGN'A 


<G 
r     < 

y  # 
y  < 
n  n 
n|  r 
fan 

PAN 
I   JV 

I  T>/ 
TN 

N  MF 


!=    FMPTY    / 

#  Nj  n  T    *  4  5       i 
PpliP    OFSTGMATpR>    #46  I 

r,Fniip  PFMr-wATriR>  **7        t 

flF       / 

r??r    i  imtt   LT«;T>    #49      t 
#50      / 

<CASF  i  TK-TT  I  IST>  #51    t 
P  #71  #7^  / 
P  <rA<:F  I  IWTT  LI«T>  #7?  #73  J 

tt*>  **?     / 

TT    I  I«T>    1 ,     <rA$r    LTVITS>    #53       ) 

#  5  <'       <<;TnpfiGr    p,  p  n  j  i  p    A  S  S  I  p.  N'  M  F  N  T  > 
M>    *.    <^TP,RAp,r    r.pDUP    A?^  tp,Nn'EnT> 


FFSrT  1 
FFCI  R  * 
M  ?  >    :  :  = 


<MPFA(F    S  T  r.  h  A  |     I  IST> 


<TLr 


CK     SfT 
Fl'l     PI 


>    :  s=   #ri.nr 

S  P  I.  A  Y     S  T  A  T  F 


:  :  s  <  r, 
tk 
<<; 

Y     <  *  N  > 

vFnt>     : 


< P I ' T  P I !T  5PFC  iTTfATTPN 
<TASF  L 
<ri.TPl'T 


<PIU 

<  ni'T 
<nm 

<Fyt 


Pin     rreTCMATPF> 
Pill     f-K 
Fin     CC 


f  n  f  r 

r it 1 1  f 


nvr   i  t ?t>   . 

<r 
yPABT?PN     M 

<ri'TPi  t 
rl'TPt'T>    t  :  = 

<r Pnpr 
A  I      n  ( .  t  p  1 1 T     r 
j"  IP? 

<m»  r  IT 


?>    :  :  = 

I  VT  T5-> 

«PFC  7F 

:  =    * A|  i 

<n<T 

•  =   <  o,  p  n 

I I  t  f  i '  t  r, 
ATF>  f  rj 
#6?    <py 

rrt'F  ar 


tpp>    #* 

<PATA     P 

*7     <STP 

6P    <STn 

<STPPA 

.rfftFF 

<?TpPA 

<r 
ppi'P    nr 

II        #7'! 

TPRAf,F 

*70    I 

.=     #FPT 

tf  RT 

i'FRT 

*PdT 

<-r»l'  T  PUT 

<  r  1 1 T  F  1 1 T 

IPATjrv 

<ni'TFin 

Fl"T  TRP 
I  ■  P  p.  r  ^  T 
P n 1 1 P  LI 
>    :  :r    t 

FFCTF  r\ 

isn^    ft 


s 

ATTFPK'    SPrCTrTCATTPM<;>    #S6       J 
RAP,F     ?PFCTFICATI0V^>     J 
RAfiF     SPFrTFirATIflMS>     t 
RF    STGMAI.     LI«T>     / 
I  TwIt«;>    <cTPPAGE     <;TGN'AL    I  IST>/ 
GF    FPFCIrTCATTPN<>     t> 
f^F    |  Iwjt<;>    <?TPRAr.F    STfiN'AL    I  IST>    ; 
5Tr,NATPR>     #5">    #69    / 
*fo    / 

Mr,NA(_    LTcT>    $,     <p,RpiiP    nFSIGMATPR> 
#^5    #69     J 

N'T?    #57    #S9       / 

\tP    #58    #«;9      / 

N'T?    *57    <niiTPUT  ^PFC  T  F  I  C  AT  I  0^!;> 

nt«    #5^    <TIITP|'T  SPFCTFKATlnM!;> 

PFSTPNATPR>  / 

P  F  S  T  C-  m  A  T  P  R  >  / 

<:>    #,    <CAcF    i  tmit«;> 

PFSTGNATPR>        > 

HP    LT?T>  J 

O.N'ATnR>    tfO       / 

^t>    *>    <r,ppiiP    0FSTf-NATnR>    #60 

FRAC"»     #61     <FvPFCTFn     PMTPHT>  / 

P|iTPt"T>  / 

ATFN'FNT>     #»     <FVPFrTFP    Pl!TFI!T> 


PFSTGVATpF>    -63    <DMA  PATTFPN    SPF  C  T  F  I  C  A  T  I  (1MS>     #^" 

l?P|Av    ctatfmtmt>    :t= 

#*"S     <FyrpCTFP    nrTPi'T>  / 

**6     <FyrFCTFn     ril'TP.I'T>  / 

T n k f i    pitfht   n t <; p i  a y   statf"Fnt>   #* 

<F\t  FPTFP    P(  TPin>  > 


POO 

1700 

000 

1800 

000< 

1900 

noo 

?ooo 

noo 

?100 

000 

??oo 

O00 

?300 

000 

?AO0 

000 

?bOO 

O00 

?600 

POO 

?700 

000 

?eoo 

0001 

?qoo 

oOO 

3000 

POO 

3100 

000 

3?00 

POO 

3300 

nOP 

3400 

000 

3500 

000 

3600 

POO 

3700 

OOO 

3^00 

ooc 

3900 

POO 

4C00 

0  00 

4100 

POO 

4?00 

noo 

43O0 

OOO 

4400 

POO 

45OO 

000 

4600 

/O001 

4700 

;pooi 

4f00 

0001 

4  COO 

o  0  0 1 

5000 

0001 

5100 

0001 

5?00 

POO 

53O0 

POO.l 

5/iO0 

POOl 

5^00 

;  o  o  o  1 

5600 

00  01 

S700 

0001 

5f00 

;  n  0  0 1 

5900 

0001 

6000 

;  o  o  o  1 

6  100 

no  01 

6?00 

no  01 

*3»0 

OOP' 

6400 

0  0  01 

^500 

0001 

6600 
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ICL  SYNTAX 
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ICL 


/SYNTAX  j 


*  PSWD 

LANGUAGE  t  I CL  J 
DOM  ZIP  TD  ISL^ 

SPECIAL  SYMBOLS  «  COtoMFNT*  BEGIN,  FND#  DO»  THEN*  ELSfJ 
<PhUGf;Ak>        11  = 

<  BLOCK  >   j 

Q      *****************************************************************   | 

<  Df  CLARATION  >    :  :  = 

<  RESULT  DECLARATION  >    / 

<  PROGRAM  DECLARATION  >    / 

<  FILE  DECLARATION  >   J 

<  RESULT  DECLARATION  >      11= 

RESULT   LIST  <  *I  >  SEPARATED  BY  »         i 

<  PROGRAM  rFCl.ARATION  >      «:  = 

r  ILLIAC  ?  PROGRAM   / 

B6500   PROGRAM   / 

JOB  ?  PARTMFR   / 

COMPILER        ] 
LIST  [  <*I> 

[  <  PARAMFTER  LIST  > 

<  label  equation  > 
/  <  papamftfr  list  >  ]  ] 
separator  »     i 

<  F  ILE    OECl  ARATION    >  J  »  = 

Rf^OO       <    B6500    F II F     MODF    >       FILE 

LIST    [    <*!>    /    <*I>    <    LABEL    EQUATION    >    J 

SFPARATLR    ,  / 

AfcSfiLUTE    ?    ILLIAC    ?    <    II  LI  AC    FILE    MODE    > 

LIST  r  <*I>  <  DISK  LAYOUT  >  /  <*I>  ] 

SEPARATOR  >     ) 


<  H650C  F II  F  MoDF  >   : »  = 

SHCLF  PPECISION  ? 
DL'I'PLF  PRECISION  ? 
PACKED  / 

UNPACKFD  ?   ; 


<  ILLIAC  F IlF  MODE  >   :  J  = 

BYTC   FILE      / 

UNSIGNFD   FILE      / 

t  SHORT  ?  t  INTEGER  /  REAL  ?  I  1  ?  FILE    > 

C        a***************************************************************       } 

<  U  I  S  K    L  A  Y  o  U  T    >       «  I  =  = 

<  *N    >    ?     r        <    UNIT    >    ?       <    NON    LIST    REQUFST    >       / 
*[        <DlSK    SPACE    >       jTj.J       i 

<  DISK  SF'ACE  >   :»  = 

<  RLCJIlESTS  >   /   <  NON  LIST  REQUEST  >   / 

<  UMT  >     C   <  NON  I  1ST  REQUEST  >     / 

<  c IMPl E  REQUFST  >      / 

<  ITERATION  REQUEST  >   I   ) 


OOOOIOOO 
00002000 
00002010 
00002020 
OC003000 
OOOOIOOO 

000C5000 
00006000 
00G07000 
OOOObOOO 
00009000 
00010000 
0001 1000 
00012000 
00013000 
O0OJ.70O0 
00016000 
00019000 
00020000 
00021000 
00022000 
00023000 
00024000 
00026000 
00030000 
00032000 
00036000 
00037000 
OOO3SOOO 
00039000 
00042000 
00046000 
00049000 
00052000 
0005*000 
00055000 
OOO56OOO 
00057000 
00056000 
00059000 
00060000 
00061000 
00062000 
00063000 
00065000 
00069000 
00070000 
00071000 
00072000 

oooaoooo 

00081000 
00065000 
00066000 
00087000 
00088000 
00089000 
00090000 
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<  SIMPLE  REQUEST  >   l»  = 

<*N>   <G)UANTIF"ICATIPN>  ?   J 

<  NfcN  LIST  REQUEST  >   » i« 

SAME  AS  <  R6500  FILE  10  >   <  CONTIGUOUS  >  ? 
BREAK  FILF    i 

<  CONTIGUOUS  >   »  »  = 

<  *i  >   ; 


< 

UNIT 

>   J  1 

a 

< 

EU  > 

< 

SU  >  ? 

< 

SU  > 

< 

EU  >  ? 

< 

El!  > 

1  Is 

C 

EUA 
tUp 
HJO 

/ 

/ 
/ 

EUl 

1 

; 

< 

SI  > 

1  !c 

[ 

SUA 
SUP. 

sue 

SUP 
SUE 
SUE 
SUO 
SU1 
SU? 
SU3 
SU<; 

/ 

/ 
/ 
/ 
/ 
/ 
/ 
/ 
/ 
/ 
/ 

SU5 

] 

; 

<  OUANTH  IcATInN  >   «  «  = 

<  *l  >   WT  STARTSV.'ITHl.ETTERC    / 

<  *\    >    ¥1    STAHTSwTTHLETTERS  [  <  *l  >  PT  ST ARTSWl THLETTERC  ]  ? 

<  multiple  >   i  1  = 

<  *k  >   ; 


<  RE&UESTS  >   t  «* 

LIST   <  REQUEST  >   SEPARATOR  , 


I 


<  Rf Out  ST  >   t  t  = 

<  PHASFT  REOUFST  >   / 

<  SIMP! I     pFQUEST  >   / 

<  ITEpATlPN  REQUEST  >   / 

<  UMT  >  ?   <  MULTIPLE  >',»(<  REQUESTS  >  #)  I 

<  f HAStO  h\  QUEST  >   : t= 

<  AbtPFSS  >  Hi       <  SUPIF  REQUEST  >  I 


<  ACLRl SS  >  Xi- 

<  *fj  >  <  *I  > 


<  *y.    > 


» 


<    JURATION    PEqUFST    >       ::= 

<    HiMpFR    >       t[       <    ADDRESS    >    It    <    DELTA    >    it    <    SIMPLE    REOUEST    > 


00092000 

00093000 

00095000 

00100000 

00101000 

00102000 

00106000 

00111000 

00112000 

00113000 

00115000 

00116000 

001 17000 

00116000 

00119000 

00120000 

00122000 

00123000 

0012^000 

00125000 

00130000 

00131000 

00133000 

00134000 

00135000 

00136000 

00137000 

00138000 

00139000 

00140000 

00141000 

00142000 

00143000 

00144000 

00149000 

00150000 

00151000 

00163000 

0016B000 

00169000 

00170000 

00174000 

00175000 

00177000 

0017&000 

00179000 

ooipoooo 

00181000 

00162000 

00163000 

00190000 

00191000 

. C0192000 

0019  30  00 

00194000 

00195000 

002C000C 

00201000 

#]00203O0O 

002C4000 

0C209000 
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<  DF  LT A  >   t  l» 

<  *N  >   <  *T  >  ?    ; 

<  *N  >    J 


<  PARAMETER  LIST  >»»= 

#(  LIST  <  *I  >  SEFAFATED  BY  >  #)   / 
EMPTY        ; 

Q      ************+****■************************************#****  +  *****      | 

<STATE^E^T    >  Us 

<  TF    STATEMENT    >  / 

<  PLUCK    >  / 

<  ASSIGNMENT  STATEMENT  >  / 

<  PFMOVF  STATEMENT  >  / 

<  CUIT  STATEMENT  >    / 

<  PPIMT  STATEMENT  >  / 

<  KPVF  STATtMFMT  >     / 

<  Td  PptiGRAK  STATEMENT  >    / 

<  pPUM  PROGRAM  STATEMENT  >        / 

<  FXECUTJCN  STATEMENT  >     / 

<  [I'KING  STATEMENT  >     / 

<  F PPK  STATEMENT  >       / 

<  CASE  STATEMENT  >    / 

EMPTY      J 


<  IF  Si ATEKFNT  >      J  »  = 

IF  <  EXPRFSSIUN  > 

THEN   <  STATFlFNT  >   I  PT  NOTELSE   / 
Fl SE  <  STATEMENT  >  ]    J 

<  ASSIGNMENT  STATEMENT  >     «»  = 

<  1L-ENTIFIEP  >    PT  RESULT 
S»|s  <  EXFHFSSICN  >    J 

<  K  VE  STAtFNFnT  >     t  :  = 

<  I L t  IaC  FILE  ID  >  #:=  <  B6500  FlLE  ID  > 

/ 

<  b6S00    FIIF     IT    >    *'=    <     ILLIAC    FRF    ID    > 

J 

<  E  >ECLT  1U  ST  ATE  Mf  NT  >   :  t  = 

<  fxfcutipn  step  >  / 

try  r   <assignnemrtfp>  / 

<  puking  statement  >  / 

<  execution  statement  >  ] 

Uh   <  STATEMFnT  > 
[   PT  MPTFLSF.   / 

F  I  SE   <  STA1F»  L'NT  >   1    t 

<  A  S  S I  (» N  N  E  N  T  STEP  >     :i  = 

<  IDF NTIF IEP  >      »T  RE  SULT 
*:=  <  FXECUTIPN  STFP  >    ; 

<  I.  l.PRG  STATFNFNT  >   j  :  = 


00210000 
00211000 
00213000 
00214000 
00215000 
002160C0 
00217000 
0021AC00 
00219000 
00220000 
00221000 
00222000 
00225000 
0022/000 
00233000 
00234000 
00235000 
00236000 
00237000 
00236000 
00239000 
00240000 

002MCP0 
002/i2000 
002/13000 
00244000 
00245000 
002/16000 
00247000 
00248000 
00250000 
00251000 
OC252000 
0U253000 
00254000 
00255000 
00260000 
00261000 
00262000 
00264000 
00267000 
0026&C00 
00269000 
00274000 
00275000 
00280000 
00262000 
00283000 
00264000 
00266000 
002B7C00 
00286000 
00269000 
00269010 
00290000 
00295000 
00296000 
00297000 
00300000 
00302000 
00303000 
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DUhING 

[  <  FXECUTION  STATEMENT  >   /   <  ASSIGNMENT  STEP   >   ] 

DO   #BFGIN 

[  <  DURING  STATEMENT  LIST  >  WHICH  IS 

LIST  <  STATEMENT  >  SFPARATFD  BY  t)     3  <END    I 


<  ALL  CF  >   Its   tALL  #OF 


IALL  ) 


<  FORK  STATFKFNT  >   | :■ 

<  ALL  nf     >    'RFGIN 

[  <  FORK  STATEMENT  LIST  >  WHICH. IS 

LIST  <  STATEMENT  >  SEPARATED  BY  *}     ]  <END    * 

<  CASE  STATEMENT  >   »  1  = 

CASE   <  EXPRESSION  >   #OF   #BEGIN 

I  <  CAsF  STATEMENT  LIST  >  WHICH  IS 

LIST  <  STATEMENT  >  SEPARATED  BY  #;  ]  #EnD    i 

<  CASE  EXPpFSSlON  >    I »= 

CASE   <  EXPRESSION  >   *OF   #( 

c  <  case  expression  list  >  which  is 

LIST  <  EXPRESSION  >  SEPARATED  BY  ,     ]  #)    } 

<  IF  EXPRESSION  >   11= 
.  IF   <  FXPRFSSION  > 

THEN   EXPRESSION  > 
ELSF   EXPRESSION  >    J 

<  TO  PFiOGFAM  STATEMENT  >   11  = 

<  IDENTIFIER  >   frT  I NOLRl NGST ATEyENTANDTH I S I STHEDUR I NGPRCGRAM 
»l=   <  EXPRESSION  >    I 

i. 

<  FhOM  program  statement  >  u  = 

<  10EF.TIF  IEP  >     PT  RESULT 
*«  =   #FROM   <  IDENTIFIER  >   PT  ISTHI STHEDURINGpROGRAM    J 


<  PRINT  STATEMENT  > 


:  :=   PRINT  <  *S  > 


} 


<  CUIT  STATFfFNT  >  :«  =  STOP  <  *N  >   /   STOP   / 

ERROR  <  *N  >   /   ERROR   I 

<  REMOVE  STATEMENT  >  lie 

REMOVE 

LIST  r  <  ILLIAC  FILE  10  >   /   <  ILlIAC  PROGRAM  ID  >  I 
SEPARATED  BY  »  I 


;<  blOCR  > 


s  It 


OBEGIN 

F  <  BLOCK  LIST  >  WHICH  IS 
LIST  r  <  DECLARATION  >  /  <  STATEMFNT  >  I 
SEPAFATED  BY  t\        ]   *END    I 

!  **************************+******#***************************#**   J 

<  LABEL  FCi.ATlpN  >   t  :  = 

L I  ST  t  ANY  NONCHAPACTER  BUT  <*M>  BUT  <*S>   J 
SFPARATUR  %/       <  *S  >  ?    J 
|  *********************************♦*♦**♦****+*********#********#*  j 

<  EXFRlSSIdN  >       :  «  = 

<  IDENTIFIER  >     PT  PFSUlT 
I  *t=    <EXFRESSI0N>   / 


00304000 
00305000 

00306000 
00310020 
00311000 
00323000 
00323010 
00323020 
00324000 
00325000 
00326020 
00327000 

00337000 
00336000 
00339000 
00340030 
00341000 
00349C00 
00350000 
00351000 
00352010 
00353CO0 
00361000 
00362000 
00363000 
0036^000 
00366000 
0036&000 
00369000 
00370000 
0C373000 
0037&000 
00379000 
00360000 
00384000 
00387000 
0038b000 
00395QO0 
00396000 
00398G00 
00-400000 
00401000 
00402000 
00403000 
00407000 
004100CO 

ootncoo 

00412000 
004)2020 
00413000 
00414C00 
00416000 
004  17  (;00 
00418000 
00419000 
004?0000 
004210C0 
00425000 
00426000 
00426010 
00426020 


i6o 


EXPRESSION^  / 


<  IF  EXPRESSION  >      / 
»+  <IF  F*PrESSIGN>  t    ■  <IF 
?KfT  <  IF  EXPFESSIrN  >   / 
<  f-  T  C I  E  *  K'  TERh>  f  #0p  <HCOlEAN  TErm>  ?5CFM!T0<0RV*O#-l)]  ]*  J 
<fOOLFAN  TF.PM>  IS  A  <P01jiFAK  FACTOR>  FOLLOy.ED  BY 
t  *AK'0  <Pr.DLEAN  FACTfR>  3*  * 

FACTTP>  IS  A  <RF|_AT1PN  TfPM>  FOLLOWED  BY  POSSIBLY  ONE 

TGFO  /  >3  <PELATITM  TERM> 
UFO  /  *J  <F<FLATirN  TEP^> 

/*<]  <pelatipn  tfrh> 

TFR*> 
TERN> 
TERV>       3 


<  t-  C  0  L  F  *  ^ 
t 
/ 

/ 
/ 
/ 
/ 

<rflaticn 
t 


<PEL ATIPN 
<PFl ATIPN 
<PrL ATIPN 
<PFI ATIPN 

TfRK'>  IS  AM  <ApITHMFTIC  TeRM>  FOLLOWED  BY 
*  +  <ARITHkETIC  T  E  P  F<  > 

-  <apithnftic  tfpk>  ]*  i 


tLSS 

r&TR 
riFo 

CEOV 


/  <] 
/  =] 


; 


AN    <AR1THKFTIC    TFRm>     IS    A    <r»RTMRY>    FOLLOWED    BY 
r       x    <PRIF«ARY> 
/    $/    <PRINARY>     ]*    J 


<  (-hIMARY  >         lie 

*+  <PRIMARY>   /   -  <PPINAFY>   / 
*  h  0 7  <PPIMARV>   / 

<  IDENTIFIFP  >   *-T  PFSULT   / 

TRUE    / 

false  / 

<  *N  >   / 

#(  <  EXPRFSSION  >  #)   / 

<  EXECUTION  STEP  >     / 
r  MINI  /  MAx  ]  *( 

r  <  MlN  MAX  EXPRESSION  LiSt  >  WHICH  IS 
I.  TST  OF  <FVPPESSICN> 

SEPARATED  by  Cp^AS  STARTING  WITH  <EXPPEssION>  3  #)   / 

<  fASE  EXPRESSION  >      ; 

<  i o t n 1 1 n e  f>  >    '  »  »  =     <*i>    > 

<  ILLIAC  FILE  IP  >      t:=      <  IDENTIFIER  >   PT  ILLIACFILEID   J 

<  ILL1*C  FFOGPAM  ID  >      :«  =      <  IDENTIFIER  >  ?T  ILL  I  ACpROGRAMlC.  t 

<  b65G0  FIl.F  ID  >      ::=      <  IDENTIFIER  >   PT  FJ6500FTLEID   J 

<  EUCLUU     STEF    >  »t  = 

<  Ci  L  lECT    STATEMENT    >         / 

<  ILEnTJfTER    >       BT     ILI  IArPP(iGPA^nRp6500pRCGPAM0PC0MPlLEP 

<  ACTUAL    rARAKFUR    PA&T    > 

[     J"  '■  I  T  H    <    IDENTIFIER    >    «•    ACTUAL    PApAMETER    PART    >     ]    ?       J 

<  ACTUAL     FAPAMFTFR    PART    >  M  = 

» (     L 1ST     [     <     *S    >       / 

DUMPfILF  =  <  ILLIAC  FILE  ID  >   / 

<*!>=<*!>   1 
SEPARATE)*  >       *)    / 
EFF1Y    i 


<    CtLLLn     ST  ATF  rFNT    >  :  :  = 

CrlLFCT     *(     LIST     <    R6500    FllF     ID    >    SFPARATOR    »    #) 
Tf>iTP    <    b*50C    F  II  F     ID    >  ; 


004?6030 
00426040 
001?6050 
00126000 
00429000 
00430000 
00431000 
00^32000 
00433000 
00434000 
00435000 
00436000 
00437000 
00438000 
00439000 
00440000 
00441000 
00442000 
00443000 
00445000 
00446  000 
00447000 
00447010 
0C447020 
00446000 
00452000 
00453000 
00454000 
O0455CO0 
00457000 
00456000 
00456oJO 
00459000 
00460000 
00461000 
00464000 
0046bo00 
00466100 
00467000 
00469000 
00470000 
0047i;OCO 
00473000 
00475000 
00476000 
004770C0 
00476000 
0048600& 
00490000 
00510000 
00511000 
00512(00 
005J6000 
00523000 
00524000 
00526000 
00527000 
005260CC 
00529000 
00530000 
00533000 
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